AD/A-002  743 

LEARNING- THEORETIC  FOUNDATIONS  OF 
LINGUISTIC  UNIVERSALS 

Kenneth  Wexler,  et  al. 

California  University 


Prepared  for: 

Office  of  Naval  Research 
Advanced  Research  P rejects  Agency 


1 November  1974 


DISTRIBUTED  BY: 


Natioiial  Teclmfcal  InformatHHi  Service 
U.  S.  DEPARTMENT  OF  COMMERCE 


security  classification  of  this  pace  (Wl,»n  Dmit  Enitrtd) 


REPORT  DOCUMENTATION  PAGE 


1.  REPORT  number 


12.  GOVT  ACCESSION  NO 


Technical  Report  No. 


4.  title  fand  Su6(/(/*; 

Learning- Theoretic  Foundations  of 
Linguistic  Universale 


7.  AuTHORr*; 

Kenneth  Wexler 
Peter  Culicover 
Henry  Hamburger 


»■  performing  organization  name  and  address 
School  of  Facial  Sciences 
University  of  California 
Irvine,  CA  92664 


1 1.  CONTROLLING  OFFICE  NAME  AND  AOORCSS 

Personnel  and  Training  Research  Programs 
Offic.  of  Naval  Research  (Code  458) 
Arlington.  VA  22217 


MONITORING  AGENCY  NAME  A AODRESSC'f  from  Cenirolling  OIHe») 


IS.  distribution  statement  Cof  tfilA  R«perl> 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


S.  recipient's  CATALOG  NUMBER 


S.  TYPE  OF  REPORT  S PERIOD  COVERED 

eml-Annual  Technical  Report 
(1  Oct  1973  - 31  Mar  1974) 


t.  PERFORMING  ORC.  REPORT  NUMBER 

Soc.Sci. Working  Paoer  ifbO 


■.  CONTRACT  OR  GRANT  NUMBER^*; 

N00014-69-A-0200-9006 


IZ.  report  DATE 

November  1.  1974 


IS.  NUMBER  OF  PAGES 
-62“ 


It.  SECURITY  CLASS,  (ol  npert) 

Unclassified 


Ita.  DECLASSIFICATION  DOWNGRADING 
SCHEDULE 


Approved  for  public  release;  distribution  unlim.lted 


17.  OtSTMl0UTlON  STATCMCNT  (of  Of  0b9trmet  mtfd  In  20,  It  0lttfmt  from  Btport) 


NATIONAL  TECHNICAL 
INFORMATION  SERVICE 

U$  of  C ommerco 

Spr.f.gff«id.  VA.  371SI 


tt.  supplementary  notes 


Submitted  to  Theoretical  Linguistics 


If.  KEY  WOROS  ('Confinu#  on  If  nnc909wr  RR0 109ntlfy  ty  klock  nvmbmr) 

linguistic  universals,  syntax,  freezing  principle,  semantics, 
language  acquisition,  learning  theory,  mathematical  linguistics 


20.  A0STRACT  (Cpotlnun  oji  If  nnennnnry  109ntlfy  by  MocA  numbnt) 

Some  aspects  of  a theory  of  grammar  are  presented  which  derive  from  a formal 
theory  of  language  acquisition.  One  aspect  of  the  theory  is  a universal 
constraint  on  analyzabillty  known  as  the  Freezing  Principle,  which  supplants 
^ of  constraints  proposed  in  the  literature.  A second  aspect  of  the 

theory  is  the  Invariance  Principle,  a constraint  on  the  relationship  between 
semantic  and  syntactic,  structure  that  makes  verifiable  predictions  of 


DD  |JAN*7$  1473  EC ITlON  OF  1 NOV  ••  IB  OBSOLETE  , 

S/H  0l01*014*440l  I 


BECURITV  CLASSIFICATION  OF  THIS  PAGE  r«k«i  OMa  Ba(ara« 


..I..IJVITY  classification  of  THIS  PACLfH-hwi  Oala  Enf«f«d; 


syntactic  universal^.  Tht  relationship  between  the  notion  of 
"explanatory  adequacy"  of  a theory  of  granunar  and  the  leamability 
of  a class  of  transformational  grammars  is  discussed. 


\(L 


security  classification  of  this  FAGEfirhan  Data  KniaraO 


LEARNING-THEORETIC  FOUNDATIONS 
OF 

LINGUISTIC  UNIVERSALS* 

K.  Wexler,  P,  Culicover 
dnd  H4  Hamburger 


Social  Sciences  Working  Paper,  60 
July  1974 


School  of  Social  Sciences 
University  of  California,  Irvine 
Irvine,  California  92664 


LEARNING-THEORETIC  FOUNDATIONS  OP  LI»'GUISTIC  UNIVERSALS 


I.  Introduction 

A.  General  oblectlves 

We  have  achelved  results  In  the  realm  of  explanatory  adequacy,  a 
subject  which.  In  spite  of  Its  recognized  centrality  to  linguistic  theory, 
has  been  largely  neglected.  On  the  other  hand,  two  Interacting  shorter- 
range  goals  have  attracted  considerably  more  attention  from  linguists. 

These  are  descriptive  adequacy  and  formal  unlversals.  Given  that  grammars 
should  consist  of  rules  of  certain  forms,  a linguist  seeks  a descriptively 
adequate  grammar  of  a particular  language,  a description  of  adult  compe- 
tence. On  the  other  hand  (s)he  may  ask  what  forms  rules  should  be  allowed 
to  take.  This  latter  task  can  be  approached  by  noting  which  kinds  of  rules 
seem  to  be  universally  useful  for  describing  natural  language.  In  this  way, 
universal  formalism  may  be  advanced. 

Suppose  that  a universal  set  of  rule  types  and  conditions  Is  found 
which  allows  grammars  to  be  constructed  for  many  particular  languages,  and 
that  these  grammars  provide  adequate  descriptions  and  even  Insightful 
generalizations  about  their  respective  languages.  Even  then,  a puzzle 
remains:  why  these  particular  formal  unlversals?  Are  they  an  accident, 

or  do  they  have  some  special  formal  property  which  makes  them  particularly 
appropriate?  Chomsky  (1965)  argues  that  there  such  a property  Which 
distinguishes  among  formal  unlversals  and  that  In  particular  it  has  to  do 
with  the  fact  that  language  must  be  learned  by  every  child.  He  writes 
(page  25): 

To  the  extent  that  a linguistic  theory  succeeds  In 
selecting  a descriptively  adequate  graumar  on  the 
basis  of  primary  linguistic  data,  we  can  say  that  It 
meets  the  condition  of  explanatory  adequacy. 
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We  add  to  this  requirement  that  the  selection  procedure  be  psychologically 
plausible. 

Here  we  shall  attempt  to  be  both  plausible  and  detailed  in  shoving 
that  the  requirement  of  "learnablllty"  can  force  a selection  among  formal 
universals.  Further,  this  research  has  yielded  the  particularly  interest- 
ing and  unique  result  that  a linguistic  principle  which  was  motivated  by 
abstract  developments  in  language  acquisition  turns  o it  to  provide  an 
account  of  several  adult  syntactic  structures  which  is  descriptively  more 
satisfactory  than  previous  accounts.  If  validated,  this  would  be  an  in- 
stance of  the  kind  of  scientific  event  in  which  a theoretical  analysis 
leads  to  an  improved  empirical  account.  Thus  it  is  appropriate  and  in 
fact,  important  to  proceed  in  this  unified  maruier.  Even  if  our  linguistic 
aralysis  should  ultimately  require  modification,  we  con'^/lder  it  worth 
explicating  ot.r  work  as  one  example  of  bow  one  might  go  about  achieving 
explanatory  adequacy.  A more  detailed  presentation  of  various  parts  of 
the  theory  with  extensive  discussion  appears  in  various  published  and 
unpublished  papers,  and  a complete  presentation  will  appear  in  a book 
which  is  presently  in  preparation^. 

B,  Fundamental  theoretical  background 

The  major  goal  of  linguistic  theory  is  to  characterize  human  language 
in  a way  th<at  is  consistent  with  the  fact  that  any  child  can  learn  any 
human  language,  provided  that  he  is  born  into  a community  where  that  lan- 
guage Is  spoken.  Thus  our  characterization  of  language  must  not  call  for  a 
potential  range  or  complexity  of  structures  that  would  necessarily  bewilder 
the  child  by  virtue  of  being  logically  impossible  to  learn.  To  quote 
Chomsky,  (1965,  p.  58) 
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It  Is,  for  the  present,  impossible  to  formulate  an  assump- 
tion about  initial,  innate  structure  rich  enough  to  account 
for  the  fact  that  gramnatlcal  Icnovledge  Is  attained  on  the 
basis  of  the  evidence  available  to  learner ....  The  real 
problem  Is  that  of  developing  a hypothesis  about  Initial 
structure  that  Is  sufficiently  rich  to  account  for  acquisi- 
tion of  language,  yet  not  so  rich  as  to  be  inconsistent 
with  the  known  diversity  of  language. 

This  goal  ttas  never  been  approached,  and.  In  fact,  linguists  l^ve 
never  seriously  taken  up  the  question  of  language  learnablllty.  Ilcst 
of  the  work  by  lingu'’3ts  %iith  regard  to  discovering  the  formal  constraints 
on  the  structure  of  human  language  has  been  concerned  with  the  inspection 
of  languages  and  the  subsequent  positing  of  constraints  or  unlversals  on 
the  basis  of  such  Inspection.  We  will  provide  examples  of  such  investiga- 
tions as  they  relate  to  our  own  work  in  Section  II  below. 

On  the  w.uer  hand.  It  Is  also  possible  to  consider  the  question  of 
linguistic  constraints  and  unlversals  by  first  establishing  the  require- 
ments which  a plausible  learning  theory  (of  language)  places  on  the 
languages  which  It  can  learn.  If  a plausible  learner  cannot  learn  a 
given  type  of  language,  then  this  constitutes  evlderice  either  that  the 
languages  which  we  call  "natural"  languages  are  not  of  this  type,  or  that 
some  refinement  Is  required  In  our  notion  of  plausible  learner. 

It  Is  demonstrable  (Gold  1967)  that  If  there  are  no  constraints  what-' 
soever  on  what  kinds  of  grammars  could  be  granasars  of  ratural  languages, 
then  no  conceivable  learning  procedure  could  guess,  from  data  from  the 
language,  which  one  of  the  conceivable  grammais  was  the  grammar  corres- 
ponding to  that  language. 

In  Hamburger  and  Wexler  (1973a, b)  and  Wexler  and  Hamburger  (1973)  a 
aodel  of  a minimally  plausible  learner  Is  constructed,  and  the  question  of 
the  learnablllty  of  various  types  of  languages  Is  then  Investigated.  It 
Is  shown  that  even  If  all  human  languages  possessed  the  same  deep  structures. 
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and  differed  only  In  the  transformations  which  constituted  tt'elr  grammars, 
no  conceivable  learning  procedure  would  be  able  to  guess  the  correct 
grammar  of  any  such  language  given  data  from  that  language  in  the  form  of 
grammatical  sentences.  Furthermore,  it  was  demonstrated  that  a niinimally 
plausible  learning  procedure  can  learn  the  grammar  of  a language  If  (a) 
the  procedure  is  presented  with  the  semantic  interpretation  of  a sentence 
when  the  senten'ie  is  presented,  and  (b)  if  certain  formal  constraints  are 
placed  on  the  applicability  of  traTtaformatious.  We  will  describe  these 
results  and  possible  extensions  of  them  more  fully  in  Section  II  below. 

It  tollows  from  the  work  Just  mentioned  that  a theory  of  grammar 
learning  is  a theory  of  grammar  in  that  a precise  specification  of  the 
learner  leads  to  a Sjiecif ication  rf  the  class  of  things  that  are  learn- 
able.  Hence  a covrect  specification  of  the  procedure  by  which  human 
beings  learn  th  i grammars  of  languages  will  lead  to  a specification  of  the 
cli.ss  of  possible  human  languages. 

C.  Methodology 

A fundamental  requires  ^nt  of  the  theory  is  that  the  learning  procedure 
be  plausible.  It  is  n<  essary,  therefore,  to  append  to  a minimal  learning 
procedure  mote  sophisticated  notions  of  memory,  attention,  self -correction, 
external  correction,  rate  of  learning,  type  of  input,  cognitive  capacity, 
etc.  Ideally,  the  plausible  learner,  should  behave  Just  like  the  child  in 
an  empirically  defined  language  learning  environment  with  respect  to  all 
these  factors. 

A second  requirement  of  the  theory  is  that  for  the  constraints  placed 
on  the  class  of  languages  by  the  learning  procedure,  all  the  available 
phenomena  from  natural  language  support  their  adoption  as  constraints  on 
natural  languages.  In  fact,  we  wish  tc  show  that  such  constraints  regularly 
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produce  the  deepest  and  most  coopelling  explanation  (to  the  linguist)  of 
the  linguistic  data.  It  Is  therefore  of  considerable  importance  to  conduct 
a systematic  Investigation  of  well-known  (and  new)  syntactic  phenomena  in 
natural  language  which  might  provide  evidence  In  support  of  or  In  oppo- 
sition to  the  precise  constraints  arising  from  the  learning  theory. 

Some  work  of  this  nature  Is  described  in  Section  III. 

A third  requirement  of  the  theory  is  that  the  constraints  arrived 
at,  as  well  as  the  specification  of  the  learning  theory,  be  universal, 
and  that  all  Implications  which  arise  from  these  specifications  also  be 
ur.l';ersal.  In  particular,  we  assume  for  the  purposes  of  maintaining  a 
plausible  learning  procedure  that  there  exists  a universal  constraint  on 
the  relationship  between  semantic  and  syntactic  structure.  Assuming 
that  semantic  structure  Is  universal,  this  leads  to  a number  of  predicted 
unlversals  of  syntactic  structure.  Hence  we  are  also  concerned  with  investi- 
gating a variety  of  the  world's  languages  to  determine  the  plausibility  of 
such  putative  unlversals.  We  discuss  this  further  in  Section  IV. 

Finally,  a requirement  of  the  theory  is  that  It  make  only  correct  pre- 
dictions about  the  actual  course  of  language  development  in  the  child.  We 
have  not  constructed  experimental  situations  in  which  such  predlctione  are 
tested.  Rather  we  are  concerned  with  the  more  primary  task  of  constructing 
firm  and  falslflable  predictions,  and  seek  to  discover  evidence  which  bears 
on  t lem  In  the  literature  on  develo]»iental  psycholinguistics.  We  discuss 
these  questions  In  more  detail  in  Section  V. 
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II.  Learnablllty  Theory 

A.  Theories  of  languitge  >cqul«ltjU>n 

A theory  of  (first)  language  acquisition  defines  a procedure  which 
models  the  essential  characteristics  of  how  the  child  acquires  his  lan- 
guage. This  procedure  must  be  powerful  enough  to  learn  any  natural 
human  language,  since  we  start  with  the  fundamental  observation  that  any 
normal  ch^Ad  can  learn  any  natural  language,  given  the  proper  environment. 
That  this  requiremenr  (of  leamability)  is  difficult  to  attain  is  evident 
from  the  fact  that  no  existing  theory  of  language  acquisition  comes  close 
to  satisfying  it. 

By  far  the  bulk  of  work  in  the  study  of  lanaiiage  acquisition  involves 
the  lescriptian  of  the  child's  linguistic  knowledge  at  various  ages.  From 
this  work  a nivmber  of  interesting  generalizations  may  be  drawn  about  tl» 
child's  language.  But  very  little  attention  has  been  given  tc  a dynamic 
theory;  that  is,  a theory  of  how,  given  the  input  that  is  available  to  him, 
the  child  arrives  at  an  adult's  knowledge  of  language. 

A few  studies  (an  important  one  is  Brown  and  tianloa  1970)  have  asked 
the  question:  why  does  a child  learn  language?  That  is,  what  coa^ls  a 

child  to  change  his  grannar  ov^r  time?  Although  very  important,  this  ques- 
tion is  only  a part  of  the  problem  of  the  study  of  language  acquisition. 
E\<m  if  we  had  an  unequivocal  answer  to  this  question  we  would  stil7  not 
know  what  the  procedure  is  which  the  child  uses  to  construct  his  gsaamr. 
(That  is,  we  would  not  know  how  a child  learns  his  language). 

When  we  come  to  those  studies  in  the  language  acquisition  literature 
which  attempt  to  sketch  a theory,  that  is  those  proposals  which  suggest  a 
procedure,  we  find  a number  of  pro;;osals,  but  none  of  the  proposals  meet 
the  first  requireawnt  stated  above;  that  is,  none  of  the  theorists  attesq>t 
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CO  show  that  the  procadura  is  strong  enough  to  learn  all  huaian  languages , 
given  irtiac  we  know  about  husan  language.  In  fact,  the  theories  are  either 
too  vague  for  the  question  to  be  seriously  asked,  or  they  are  clearly  coo 
weak  Co  learn  any  sui-stancial  aaount  of  syntax. 

Tbe  c<»BK>n  aethodology  «diich  sost  of  these  studies  of  the  theory  of 
language  acquisition  adopt  is  Co  take  soae  description  of  the  ^eech  of  a 
child  at  an  early  age  and  to  then  hypothesize  a \my  in  vdiich  char,  speech 
could  have  been  learned.  This  is  true  for  exaaple,  of  McKeill  (1966)  and 
Braine  (1963).  The  correct  description  of  children's  knowledge  of  language 
at  a given  age  is  not  easy  to  attain,  and  this  can  cause  problems.  Thus 
Braine  (1963)  outlines  a theory  of  how  a pivot  grasstar  ui^hc  be  learned, 
but  Blo(xn  (1970)  and  Brown  (1973)  show  quite  clearly  chat  pivot  grammars 
are  not  appropriate  aodels  of  children's  language. 

For  Che  problem  of  learning  transformations  we  find  little  help  in  the 
literature.  Although  the  construction  of  an  "evaluation  proc^ure"  is  taken 
as  a central  goal  of  Linguistics,  no  linguist  has  offered  a procedure  and 
demonstrated  chat  it  can  converge  to  a correct  grammar.  In  the  field  of 
language  acquisition,  McNeill  (1966)  discusses  Che  learning  of  transfoma- 
cions  ana  offers  a hypothesis  (namely,  that  transformations  reduce  memory 
load)  as  to  why  they  are  acquired.  But  be  offers  no  hypothesis  about  the 
procedure  by  which  they  are  acquired,  and,  therefore,  no  proof  Chat  a given 
procedure  is  strong  enough  ro  learn  language.  Fodor  (1966)  recognizes  the 
cifficulcy  of  the  problem  end  suggests  one  strategy,  which  he  claiais  might 
account  for  one  very  small  part  of  the  procedure  idierein  base  structures 
are  "induced"  from  surface  strings,  but  no  proof  of  success  is  given.  Slobin 
(1973)  suggests  such  "operating  principles"  as  "pay  attention  to  the  order  of 
words  and  morphemes",  but  no  more  explicit  procedures  nor  outline  of  a proof 
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of  success  are  proposed.  Bralne  (1971)  offers  sone  hints  at  a "discovery- 
procedures"  Bodel,  and  applies  the  aodel  to  soae  simple  examples,  but  the 
model  is  certainly  not  strong  enough  to  have  success  claimed  for  It.  In 
most  other  studies  (there  are  a large  number  of  them — see  Ferguson  and 
Slobln  1973,  for  a bibliography) , no  hypotheses  about  learning  procedures 
are  suggested. 

The  field  of  computer  simulation  also  provides  little  Insight.  Kelley 
(1967)  has  written  a language  learning  program  which  deals  with  only  the 
simplest  stages  of  language  acquisition  and  which  makes  no  mention  of  trans- 
formations nor  of  Che  phenomena  accounted  for  by  transfomatlnns.  The  only 
grasmacical  hypotheses  which  his  learner  can  make  represent  contingencies 
between  adjacent  elements  In  phrase-markers — far  too  weak  to  accotjnt  for  the 
learning  of  transformations.  A7.so,  as  Is  common  with  simulation  studies,  it 
Is  not  clear  exactly  what  the  program  can  do. 

Klein  and  Kuppln  (1970)  have  written  a program  to  learn  transformational 
grammar.  The  program  is  Intended  to  be  more  a model  of  the  linguistic  field- 
worker  than  of  the  child  learning  a first  language.  Again,  It  is  not  clear 
what  the  program  can  learn.  A few  simple  exoples  are  given,  but  the  range 
of  the  program  Is  undefined.  Indeed,  the  authors  call  the  program  "heuristic" 
because  it  does  not  guarantee  success.  It  seems  to  us  that  heuristic  (In  this 
sense)  progr.^as  might  be  acceptable  as  models  of  humans  In  situations  where 
humans  may.  Indeed,  fail  (say,  problem  solving,  or  the  discovery  of  scientific 
theories,  or  writing  a grammar  as  a field-worker  for  some  foreign  langt  age, 
which,  in  fact,  is  Klein  and  Kuppln's  situation),  but  the  fundamental  assump- 
tion in  the  study  of  language  acquisition  Is  that  every  normal  child  succeeds. 
Thus  we  must  have  whan  Klein  and  Kuppln  call  "algorithmic"  procedures — 
ones  for  which  success  Is  guaranteed.  (Note  that  Klein  and  Kuppln’s 


is  not  necesE-’.rlly  the  sense  in 
Intelligence. ) 
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sense  o£  ^ 

co™»n  usage  In  the  field  of  artificial  Intelligence.) 

Klein  and  ICuppln  nake  a nunber  of  asauaptlons  which  would  he  quite 
inpiauslble  In  nolels  of  a child  learning  a first  language.  First,  they 
assume  that  the  learner  receives  Infomatlon  about  what  strings  are  non- 
sentences. Although  this  infort^tlon  any  he  available  to  a field-worker. 

It  is  probably  not  available  to  a child  (Brown  and  hanlon  1970;  Bralne  1971; 
hrvln-Trlpp  1971).  Second,  they  assume  that  the  learner  can  remember  and  use 
all  data  It  has  ever  received.  Third,  each  time  the  learner  hypothesises 
a new  transformation  it  tests  it  extensively. 

All  these  assumed  capacities  of  -t,  uravallable  to 

the  child,  on  the  other  hand,  only  obligatory,  ordered  transformations  are 
allowed,  so  that  the  class  of  grammars  Is  not  rich  enough  to  describe  all 
natural  languages.  StUl.  there  Is  no  reason  to  believe  that  Klein  and 
Kupplo’s  learner  can  learn  an  arbitrary  grammar  of  the  kind  they  assume. 

=old  (1,67)  provided  a formal  definition  of  language  learning  and  showed 
that  according  to  this  definition  »st  classes  of  languages  (Including  the 
finite  state  languages  and  thus  any  super-class  of  these  such  as  the  cor.-.emt- 
free  langua.,es)  were  not  learnable  If  only  Instances  of  gra„atical  sentences 
-ere  presented,  .nny  of  these  language  classes  are  learnable  If  '■negative 
infoimation".  that  is.  Instances  of  non-sentences.  Identified  as  such  are 
also  presented.  „„.ever,  as  noted  above  the  evidence  Is  that  children  do 
not  receive  such  negative  Information.  Any  theory  of  language  learning 
Which  depends  heavily  upon  nega  Ive  Information  will  probably  turn  out  to 
be  incorrect  and  will  very  Ilk  ly  not  yield  Ins  ats  on  fctmal  gramnatlcal 
universal.  with  such  a powerful  Input,  what  cons  tts  actually  eklst 

Will  be 


unnecessary . 
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Ocher  stud ' on  granoar  learning  have  been  made  by  Feldman  (1967 , 
1969),  Feldman  et  al.  (1969),  and  llcrnlng  (1969).  These  studies,  ^ile 
interesting  in  themselves,  do  not  deal  with  the  question  of  learning 
systems  which  linguists  argue  are  necessary  for  natural  language  (e.g., 
transformations) . 

3.  Formal  results  on  learnability 

The  absence  of  linguistically  relevant  results  in  learnability  theory 
led  us  to  study  the  learnability  of  transformational  grammars.  Since  each 
transformational  grammar  Includes  a phrase-structure  grammar  as  a part  of 
it.  Gold's  results  would  seem  to  preclude  learnability  from  information 
consisting  only  of  sentences.  At  this  point  there  arr  two  wnys  to  proceed: 
either  restrict  the  class  of  grammars  or  enrich  the  information.  We  will 
d.scuss  each  of  these  possibilities  in  turn. 

The  first  approach  (Wexler  and  Hamburger  1973)  is  to  try  to  restrict 
the  class  of  grammars  to  achieve  learnability  from  the  presentation  of 
grammatical  sentences  only.  We  showed  that  even  a very  severe  restriction 
on  the  grammars  did  not  give  learnability.  Specifically  we  required  that 
there  be  a universal  context-free  base  grammar  and  that  each  language  in 
the  class  of  languages  be  defined  by  a finite  set  of  transformations  on 
this  base  grammar.  If  the  base  is  taken  as  universal,  then  it  may  conceiv- 
ably be  regarded  as  innate,  and  hence  need  not  be  learned.  Still  remaining 
to  be  learned,  however,  are  the  particular  transformations  that  appear  in 
the  l'.ngu<ige  to  be  learned.  Linguists  are  in  broad  agreement  (a  possible 
exception  is  Bach  1965)  that  most  of  these  at  least  must  be  learned.  Thus 
by  assuming  a universal  base,  we  make  the  learner's  task  as  easy  as  we  can, 
without  trivializing  it.  Still  we  obtained  a negative  result;  that  is,  we 
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proved  that,  given  sentences  as  data,  no  learner  could  succeed  in  learning 
an  arbitrary  language  of  this  kind. 

It  Is  Important  to  stress  that  the  function  of  making  over-strong 
assumptions  when  we  are  obtaining  negative  results  Is  not  to  claim  that 
the  over-strong  assumptions  are  correct,  but  to  show  that  even  with  these 
over-strong  assumptions  the  class  Is  unleamable,  and  thus  without  them  It 
Is  also  unlearnable.  For  example,  here  we  made  the  too-strong  assumption 
of  a universal  base  and  showed  non-learnablllty  of  certain  classes  of  trans- 
formational languages.  Thus  without  a universal  base  such  classes  are 
a fortiori  unlearnable. 

The  next  step  (Hamburger  and  Wexler  197 3a, b)  was  to  enrich  the  Infor- 
mation presentation  scheme  In  an  attempt  to  achieve  a positive  result.  We 
thus  made  the  assumption  that  given  the  situational  context  of  a sentence 
the  learner  had  the  ability  to  Infer  an  Interpretation  of  the  sentence  and 
from  the  interpretation  to  Infer  Its  deep  structure.  Now  this  Is  a very 
strong  assumption  (Chomsky  1965  notes  that  It  Is  very  strong,  though  not 
necessarily  wrong),  and  we  have  already  begun  to  weaken  It  further.  But 
the  important  point  Is  that  we  finally  achieved  a positive  result.  That  Is, 
if  we  assume  that  the  information  scheme  Is  a sequence  of  (b,s)  pairs  where 
b Is  a base  phrase-marker  and  s Is  the  corresponding  surface  sentence  (not 
the  surface  phrase-marker,  since  there  Is  no  reason  to  assume  that  this  In- 
formation Is  available  to  the  learner  In  complete  detail)  a procedure  can 
be  constructed  which  will  learn  any  finite  set  of  transformations  which  satisfy 
the  assumed  constraints. 

By  "lear.i"  we  mean  that  the  procedure  will  eventually  (at  some  finite 
time)  select  a correct  set  of  transformatlorc  and  will  not  change  Its 
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selection  after  that  time.  For  a sketch  of  the  proof  and  a discussion 
of  assumptions,  see  Uamburger  and  Wexler  (1973a).  For  the  complete  proof, 
see  Hamburger  and  Wexler  (1973b). 

In  the  event  that  the  reader  thinks  that  with  these  strong  assumptions 
the  proof  of  learnablllty  is  easy  and  straightforward  he  should  look  at  the 
proof  of  the  learnablllty  theorem  in  Hamburger  and  Wexler  (1973b) . As 
Peters  (1972)  notes,  the  power  of  transformations  that  have  been  assumed  Is 
far  too  large.  And,  In  fact.  In  addition  to  assumptions  made  (explicitly 
or  Implicitly)  In  Chomsky  (1965)  (for  example,  all  recursion  In  the  base 
takes  place  t'^orough  S,  and  transformations  are  cyclic).  It  was  necessary  to 
make  six  special  assumptions  In  order  to  derive  the  result.  The  first, 
called  the  Binary  Principle,  states  that  no  transformation  may  analyze  more 
deeply  than  two  S's  down.  It  is  quite  significant  that  this  principle, 
assumed  for  the  proof  of  the  learnablllty  theorem,  was  later  proposed  Inde- 
pendently on  purely  descriptive  grounds  by  Chomsky  (1973),  who  called  It  the 
"Subjacency"  Condition.  We  have  since  found  further  descriptive  evidence 
for  it.  We  propose  that  the  reason  that  the  Binary  Principle  exists  Is  that 
without  It  natural  language  would  be  unlearnable.  The  fact  that  the  Binary 
Principle  Is  necessary  both  for  learning  and  descriptive  reasons  lends  strong 
support  to  Its  status  as  a formal  linguistic  universal.  (It  should  be  noted 
that  the  u.scrlptlve  arguments  are  controversial — see  Postal  (1972)  for 
arguments  that  transformations  must  analyze  more  deeply). 

The  other  assumptions  are  all  motivated  by  the  fact  that,  even  with  the 
Binary  Principle,  the  number  of  possible  structural  analyses  is  unbounded, 
so  that  the  learning  procedure  can  be  led  astray.  We  therefore  made  some 
rather  brute-force  asssumptlons  about  the  analyzablllty  of  certain  nodes 
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after  raising  and  some  other  operations.  (For  the  explicit  definition  of 
these  five  assumptions  see  Hamburger  and  Wexler  1973b). 

Even  though  these  five  extra  assumptions  enabled  us  to  show  learn- 
ability,  there  was  one  rather  unsatisfying  feature  of  the  result.  We 
showed  that  the  average  number  of  data  it  took  for  the  learner  to  get  to  a 
correct  grammar  was  less  than  a certain  upper  bound,  but  this  bound  was 
very  high  in  comparison  to  the  number  of  sentences  a child  hears  in  the 
few  years  it  takes  him  to  learn  his  language. 

It  was  therefore  extremely  compelling  for  us  to  discover  later  that 
the  five  assumptions  can  be  replaced  by  a single  constraint  called  the 
Freezing  Principle  (see  Section  111,  Wexler  and  Culicover  1973,  Cullcover 
and  Wexler  1973,1974a)  which  still  allows  the  learnabillty  theorem  to  be 
proved  and  which  has  the  following  properties  that  (compared  to  the  origi- 
nal live  assumptions): 

1.  a)  It  is  more  simply  and  elegantly  stated  and  in  more 

"linguistic"  terms. 

b)  The  proof  of  the  learnabillty  theorem  is  much  more 
natural  and  simple. 

2.  It  provides  a better  description  of  English,  and  in  fact 
is  more  adequate  in  explaining  Judgments  of  grammatl- 
callty  in  English  for  a crucial  class  of  phenomena  than 
ocher  constraints  considered  in  linguistics  to  dace. 

3.  The  learning  procedure  is  simplified  and  is  more  plausible 
as  a model  of  the  child. 

4.  All  transformations  can  be  learned  from  data  of  degree  0, 

1 or  2;  that  is,  the  learner  does  not  have  to  consider 


sentences  which  contain  sentences 


i^lch  contain 


sentences  which  contain  sentences,  or  sentences  more 
complex,  than  these.  This  result  permits  a drastlca  ly 
reduced  bound  on  expected  learning  time.  (Result  4 only 
holds  with  added  assumptions.  Interesting  In  themselves.) 

These  results  (especially,  from  the  standpoint  of  learning , the  third  and 
fourth)  lend  strong  credence  to  the  Freezing  Principle.  As  a slde-llght. 
It  Is  quite  Interesting  to  observe  that  neither  the  Freezing  Principle  nc" 
the  five  assumpclcns  are  stronger  than  each  other  In  terms  of  generative 
capacity.  That  Is,  each  allows  derivations  that  the  other  does  not  allow. 
Thus  the  crucial  questions  In  language  acquisition  and  linguistic  theory 
do  not  depend  on  the  grammatical  hierarchy  and  thus  bear  out  the  conjec- 
ture of  Chomsky  (1965,  p.  62)  who  wrote: 

It  Is  Important  to  keep  the  requirements  of  explanatory 
adequacy  and  feasibility  In  mind  when  weak  and  strong  genera- 
tive capacities  of  theories  are  studied  as  mathematical  ques- 
tions. Thus  one  can  construct  hierarchies  of  gramnatlcal 
theories  In  texms  of  weak  and  strong  generative  capacity, 
but  It  Is  Important  to  bear  In  mind  that  these  hierarchies  do 
not  necessarily  correspond  to  what  is  probably  the  empirically 
most  significant  dimension  of  Increasing  power  of  linguistic 
theoxy.  This  dimension  is  presumably  to  be  defined  in  tenos 
of  the  scattering  In  value  of  grammars  compatible  with  fixed 
data.  Along  this  empirically  significant  dimension,  we  should 
like  to  accept  the  least  "powerful"  theory  that  Is  empirically 
adequate.  It  might  conceivably  turn  out  that  this  theory  Is 
extremely  powerful  (perhaps  even  universal,  that  Is,  equiva- 
lent iux  generative  capacity  to  the  theory  wf  Turing  machines) 
along  the  dimension  of  weak  generative  capacity,  and  even  along 
the  dimension  of  strong  generative  capacity.  It  will  not 
necessarily  follow  that  It  Is  very  powerful  (and  hence  to  be 
discounted)  In  the  dimension  which  Is  ultimately  of  real 
empirical  significance. 

It  Is  further  evidence  for  the  Freezing  Principle  that  It  turns  out  to 
be  quite  {lowerful  In  just  this  way.  As  we  have  written  (Wexler  and 
Cullcover  1973,  p.  21): 
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la  fact,  we  aim  to  show  that  a version  of  the  Freezing 
Principle  Is  a fundamental  component  of  the  evaluation 
metric  for  syntactic  descriptions:  by  assuming  the 

Principle  we  are  forced  Into  rather  particular  descrip- 
tions. Unlike  some  of  current  linguistic  theory,  a 
theory  with  the  Freezing  Principle  is  not  at  all  neutral 
with  respect  to  alternative  descriptions  in  general,  but 
makes  unequivocal  statements  as  to  which  of  the  alterna- 
tives Is  correct  In  most  cases. 

Freezing  Principle  Is  thus  unique  among  linguistic  constructs 
in  that  It  Is  supported  loth  by  learning-theoretic  and  by  descriptive 
linguistic  arguments.  Such  merging  of  these  two  kinds  of  arguments  ele- 
vates the  discussion  to  the  level  of  "explanatory  adequacy"  (Chomsky, 

1965) . 

Ue  propose  the  Freezing  Principle  as  a formal  universal  of  language 
and  claim  as  evidence  for  it  chat  (a)  It  plays  a key  role  In  making 
language  learnable  In  a reasonable  amount  of  time,  while  at  the  same  time 
(b)  In  also  provides  In  our  opinion  the  best  available  syntactic  description 
for  a wide  variety  of  adult  linguistic  data.  By  simultaneously  satisfying 
these  two  Cilterla,  this  theory  begins  to  explain  why  adult  language  has 
the  structure  It  does,  rather  than  merely  describing  that  structure. 

A major  controversy  in  the  study  of  the  theory  of  language  acquisi- 
tion in  recent  years  has  been  the  question  of  whether  formal  structural 
unlversals  had  to  be  Innate  In  the  human  child  or  whether  only  general  cog- 
nitive learning  abilities  were  required,  as  argued,  for  example.  In  Putnam 
(1V)67).  U seems  to  us  that  our  %rork  provides  evidence  for  the  formal  univer- 
sal position  since,  without  assuming  the  existence  of  formal  unlversals, 
we  cannot  show  chat  language  Is  learnable.  We  did  not  come  to  this  conclu- 
sion a priori;  rather  the  study  of  learnablllty  theory  forced  It  on  us. 

Also,  It  should  be  noted  that  In  order  to  obtain  the  proof  of  the  learnablllty 
theorems  we  had  to  construct  an  explicit  procedure  which  can  be  taken  as 
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a model  of  some  aspects  of  the  child  learning  language.  This  procedure 

contains  a number  of  aspects  which  might  reasonably  be  called  parts  of  a 

"general  learning  strategy".  For  cxamole,  the  procedure  forms  hypotheses 

based  upon  the  evidence  with  which  It  Is  pr'osented  and  changes  these 

hypotheses  when  evidence  counter  to  them  Is  presented.  It  Is  conceivable 

that  this  kind  of  learning  Is  operative  In  many  cognitive  domains  but  tliat 

the  particular  formal  structure  of  the  objects  upon  which  hypotheses  are 

2 

formed  or  which  constitute  data  axe  different  In  the  various  domains.  At 
any  rate,  to  our  knowledge,  io  "general  learning  strategies"  theory  exists 
which  has  been  proved  to  be  successful  In  learning  language,  or  even  a 
significant  part  of  It. 

Recall  that  we  require  not  only  that  the  1 arnlng  procedure  converge 
to  an  appropriate  grammar,  but  that  It  do  so  In  a "reasonable"  way,  that  Is, 
by  being  In  at  least  approximate  accord  with  the  evidence  as  to  how  human 
children  learn  language.  The  fact  that  the  procedure  Is  able  to  learn 
from  degree  0,  1 and  2 data  Is  in  accord  with  this  requirement.  But  there 
are,  of  course,  other  properties  of  the  procedure  which  must  meet  the 
requirement.  The  procedure  works  by  always  hypothesizing  a finite  set  of 
transformations  (the  transformational  component).  If  at  any  time  a (b,s)  pair 
is  presented  which  is  not  correctly  handled  by  the  current  component,  either 
a)  one  of  the  current  transformations  Is  rejected  from  the  component  or  b) 
one  is  added.  This  is,  of  course,  done  in  a tcasonable,  not  arbitrary, 
manner.  In  th.'.s  way,  a correct  set  of  transformations  Is  eventually  obtained. 
This  last  stat'snent,  of  course,  requires  a long  and  complex  proof. 

.Note  that  this  procedure  has  twe  properties  which  are  quite  desirable. 
First,  only  one  transformation  at  a time  Is  changed.  This  seems  more  in 
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accord  with  what  we  ''bserve  In  the  child's  developing  )^,ramoar  than  would 
the  wholesale  rejection  of  transformational  components  called  for  by  Gold's 
(1967)  methods.  Although  the  grammar  changes  gradually  (rule-by-rule),  the 
language  (l.e.,  the  set  of  sentences)  may  exhibit  discontinuities  over  time 
in  that  the  change  of  one  rule  may  affect  a large  number  of  different  kinds 
of  sentences.  'Chls  Is  exactly  as  we  would  axpect  from  studies  of  children's 
grarmrnr . 

Secondly,  the  procedure  does  not  have  to  store  the  data  with  which  It 
has  been  presented.  (Such  storage  Is  a feature  both  of  Gold's  formal  stud- 
ies and  of  Klein  and  Kuppin's  r.lmulatlons. ) Rather  It  determines  the  new 
transformational  component  completely  on  the  basis  of  the  current  transfor- 
mational component  plus  the  current  datum.  This  Is  desirable  because  It  is 
quite  unlikely  that  the  child  explicitly  remembers  all  the  sentences  he  has 
heard.  As  Bralne  (1971)  notes: 

The  human  discovery  procedure  obviously  differs  In  many  respects 
from  the  kinds  of  procedures  envisaged  by  Harris  (1951),  and 
others....  A more  Interesting  and  particularly  noteworthy  dif- 
ference, It  seems  to  me.  Is  that  the  procedure  must  be  able  to 
accept  a corpus  utterance  by  utterance,  processing  and  forgetting 
each  utterance  before  the  next  Is  accepted,  l.e. , two  utterances 
of  rhe  corpus  should  rarely.  If  ever,  be  directly  compared  with 
each  other.  Unlike  the  linguist,  the  child  cannot  survey  all  his 
corpus  at  once.  Note  that  this  restriction  does  not  mean  that 
two  sentences  are  never  compared  with  each  other;  It  means,  rather, 
that  if  two  sentences  are  compared,  one  of  them  Is  self-generated 
from  those  rules  that  have  already  been  acquired. 

The  fact  that  transformational  components  are  learnable  even  given  these 

two  rather  severe  restrictions  on  the  procedure  lends  further  support  to 


the  theory. 
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III.  Syntax 

A.  The  Freezing  Principle 

The  Freezing  Principle  enters  Into  a descriptive  account  of  English 
as  a univ<>''sal  constraint  on  the  operation  of  transformational  rules. 

There  is  one  cruc.  al  difference  between  the  Freezing  Principle  and  other 
constraints  on  the  application  of  transformations  which  have  been  pro- 
posed In  the  literature;  namely,  the  Freezing  Piinclple  emerges  from  a 
theoretical  analysis  of  the  foundations  of  linguistic  theory  (l.e.,  learn- 
abllity  studies),  while  other  constraints  are  (more  or  less  abstract) 

3 

generalizations  from  the  data  of  syntactic  description  . The  Freezing 
Principle  also  turns  out,  we  believe,  to  be  more  descriptively  adequ.<te 
than  other  constraints  proposed  In  the  literature. 

Before  stating  the  Freezing  Principle,  we  state  a few  of  the  assump- 

4 

tlons  of  syntactic  theory.  The  theory  (In  the  by  now  well-known  notation  ) 
assumes  that  context-free  phrase-structure  rules  (the  base)  generate 
phrase-markers  (trees).  (These  trees  are  ordered;  this  assumption  will  be 
modified  In  the  next  section.)  In  the  derivation  of  any  sentences,  let 

Pq  be  the  phrase-marker  generated  by  the  base,  that  Is,  the  deep  structure 

5 

of  s . Then  a transformation  changes  Pq  to  the  phrase-marker  P^,  another 

transformation  changes  P,  to  P„,  and  so  on,  until  P , the  surface  structure 

1 z n 

of  s,  is  reached.  The  terminal  string  of  P Is  s.  P.  , P„,  P,  are  called 

n 1 z a 

derived  phiase-markers . 

For  nodes  A and  B in  a phrase-marker  we  have  the  notion  A dominates  B, 
where  the  root  (l.e.,  the  highest  S-node)  dominates  all  other  nodes.  We 
mean  strictly  dominate,  so  that  A does  not  dominate  A.  If  A dominates  B 
and  there  is  no  node  C so  that  A dominates  C and  C dominates  B,  then  we  say 
A immediately  dominates  B.  The  Inwediate  structure  o2  A is  the  sub-phrase- 


marker  consisting  of  A,  the  nodes  A^  ...  A^  that  A Immediately  dominates. 
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In  order,  and  the  connecting  branches.  The  immediate  structure  of  A Is  a 

base  immediate  structure  if  A -*■  A.  ...  A is  a base  rule.  Otherwise  it 
1 n 

If.  non-base.  Before  formally  stating  the  Freezing  Principle  we  will 
Illustrate  its  application  to  some  particularly  clear  and  siiig>le  data, 
for  which  no  explanation  oi.her  than  the  Freezing  Principle  has  (so  far 
as  ve  know)  ever  been  proposed.  In  fact  these  observations  have  not,  as  far 
as  «e  know,  ever  been  made  before.^ 

There  is  a transformation  called  COMPLEX  NP  SHIFT  which  moves  a complex 
NP  (i.e.,  one  which  Immediately  dominates  an  S)  to  the  end  of  its  verb  phrase, 
as  illustrated  in  (1) . 

(la)  John  gave  [the  poisoned  candy  which  he  received  in  the 
mail]  to  the  police. 

(lb)  John  gave  to  the  police  [the  poisoned  candy  wliich  he 
received  in  the  mall]. 

(The  brackets  indicate  the  substring  which  comprises  the  complex  NP  in 
(1).)  Ross  (I967:51ff)  has  shown  that  the  rule  applies  to  a structure 
with  constituents  ordered  as  in  (la)  to  produce  a structure  with  constituents 
ordered  as  in  (lb). 

A surprising  face  is  that  there  can  be  no  movement  of  the  object 
of  the  to-phrase  (henceforth  the  "indirect  object")  just  in  case  COMPLEX 
NT  SHIFT  has  applied  first.  Coc^are  (2a)  and  (2b).  ("0"  indicates  the 

underlying  location  of  the  moved  constituent,  which  is  underlined.) 

(2a)  Who  did  John  give  [the  poisoned  candy  which  he  received 
In  the  mail]  to  0? 

(2b)  * Who  did  Jotm  give  to  0 [the  poisoned  candy  \^ich  he 

received  in  the  mall]? 
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Similar  facts  hold  fur  relative  clauses. 

(3a)  The  police  who  John  gave  [the  poisoned  candy  which  he 

received  In  the  mallj  to  9 veiv.  astounded  by  his  bad  luck. 

(3b)  * The  police  who  Jolin  gave  to  9 [the  poisoned  candy  which 

he  received  In  the  mail]  were  astounded  by  his  bad  luck. 

At  fl7st  sight  It  eight  seea  as  If  there  might  be  it  number  of  possible 
explanations  of  these  facts.  In  Wexler  and  Cullcover  (1973),  however,  we 
offer  evidence  and  arguments  to  rule  out  possible  explanations  Involving 
currently  available  devices  of  linguistic  theory.  These  Include  rule 
ordering,  global  deviatlonal  constraints  and  perceptual  strategies. 

The  Freezing  Principle,  however,  works  perfectly  here.  The  Freezing 
Principle  essentially  says  that  If  a structure  has  been  transformed  so  that 
It  Is  no  longer  a base  structure  (l.e.,  generable  by  the  phrase-structure 
rules)  then  no  further  transformation  may  apply  to  that  structure.  To  see 
how  this  applies  to  these  data,  note  how  the  transformation  of  complex 
NP-SHIFT  affects  the  phrase-marker  (4). 
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la  the  derived  phrese-aerker  VP  iiBiedletely  doainates  the  sequence 
V Pr  NP.  '".t  VP  Is  not  s base  structure,  that  is  there  is 

/IN 

V PP  HP 

no  phrase-structure  rule  In  the  base  co^>onent  of  the  fora  VP  V PP  NP. 
nuts  we  say  that  VP  Is  "frozen",  ^Ich  means  that  no  transformation  may 
analyte  any  node  which  VP  dominates.  (To  Indicate  that  VP  is  frozen  %ie 
place  a box  around  It).  In  particular  no  transformation  may  analyze  NP^, 
since  it  li*  under  VP.  Ibnis  WH- FRONTING  may  not  apply,  and  (2b)  and  (3b)  are 
ungraanatlcal . 

To  give  a more  formal  account  of  the  Freezing  Principle  we  first  make  the 
following  definition  of  a frozen  node. 

Deflntlon;  If  the  Immediate  structure  of  a node  In  a derived  phrase- 
marker  Is  non-base  then  that  node  Is  frozen. 

Vte  can  then  state  the 

Freezing  Principle;  If  a node  X of  a phrase-marker  Is  frozen,  then 

no  node  which  X dom.lnates  may  be  analyzed  by  a 
transformation. 

Note  that  no  node  which  X dominates  may  be  analyzed,  not  Just  the  nodes 
which  immediately  dominates.  Also  note  that  by  this  definition,  since 
X does  not  dominate  X,  If  X is  frozen.  It  may  Itself  be  analyzed  by  a 
transformation  (unless  some  Y which  dominates  X Is  also  frozen). 

Notation;  A box  around  a node  X In  a phrase-marker  P,  i.e.  } X | , 


indicates  that  X is  frozen. 
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K L M N 

In  this  example , C la  frozen,  i.e.,  C -*>  G U Is  not  a base  rule.  Thus  the 
nodes  labelled  G,H,M,  and  N may  not  be  analyzed  by  a transformation. 

The  Freezing  Principle  blocks  the  application  of  all  transformations 
to  parts  of  a phrase-marker.  It  does  this  by  freezing  certain  nodes.  If 
a transformation  distorts  the  structure  of  a uode  so  that  It  is  no  longer 
a base  structure,  then  no  further  transformation  may  apply  to  elements 
beneath  that  node. 

This  definition  captures  formally  our  discussion  of  the  complex  NP- 
SHIFT  data.  Note  In  particular  that  only  VP  is  frozen,  so  that  the  subject 
of  the  sentence  may  be  questioned  or  relativized. 

(5a)  Who  gave  to  the  police  the  poisoned  candy  which  John 
received  in  the  mall? 

(5b)  The  man  who  gave  to  the  police  the  poisoned  candy  which 
John  received  in  the  mall  was  his  brother. 

B.  Some  empirical  justification 

We  have  shown  in  Uexler  and  Culicover  (1973)  and  Culicover  and  Wexler 
(1973,  1974a)  that  the  Freezing  Piinciple  applies  to  a wide  variety  of 
apparently  unrelated  syntactic  domains.  These  include  adverb  placement, 
GAPPING,  WH-FRONTINC,  deletion  rules,  “seems",  DATIVE,  and  many  more. 

Many  of  the  arguments  are  rather  complex,  and  require  the  presentation  of 
considerably  more  data  than  this  exposition  can  comfortably  accommodate. 
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We  will  restrict  c'lrselves  here  to  the  development  of  several  of  these 
cases. 

The  first  case  Illustrates  that  the  Freezing  Principle  explains 
phenomena  resistant  to  some  of  the  most  successful  constraints  on  the 
application  of  transformations  proposed  to  date.  It  Is  a well  known  fact 
that  a constituent  of  a complement  sentence  may  be  questioned  and  rela- 
tivized, except  when  the  sentence  Is  a subject  ccsplement.  Thus, 

(6a)  It  Is  obvious  ^[that  Sam  Is  going  to  marry  Susan]. 

(6b)  Who  Is  It  obvious  ^[that  Sam  Is  going  to  marry  0]? 

(6c)  Susan  Is  the  girl  who  it  Is  obvious  -[that  Sam  Is 
going  to  marry  0]. 

(7a)  g[that  Sam  Is  going  to  marry  Susan]  Is  obvious. 

(7b)  *Who  Is  g[that  Sam  Is  going  to  marry  0]  obvious? 

(7c)  *Susan  7 - the  girl  who  g[that  Sam  Is  going  to  marry  0] 

Is  obvious. 

Similar  results  obtain  with  the  comparative,  which  Bresnan  (1972) 
argues  Involves  deletion  In  the  than-clause. 

(3a)  John  Is  dumber  than  it  Is  conceivable  ^[that  George  could 
ever  be  0]. 

(3b)  *John  is  dumber  than  g[that  George  could  ever  be  0]  Is 
conceivable. 

The  us).al  explanation  of  these  facts  Is  the  A-over-A  constraint 
(Chomsky  1964,  1968:43),  which  requires  that  an  extraction  transformation 
applying  lo  a phrase  of  type  A such  as  the  one  Illustrated  in  (6)  - (7) 
must  apply  to  the  maximal  phrase  of  that  type.  Under  this  analysis  the  sub- 
ject complement  is  immediately  dominated  by  NP,  so  that  the  WH-FRONTING 
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rale  cannot  extract  any  NP  which  Is  contained  within  the  subject  comple- 
ment. This  condition  does  not  apply  to  the  extraposed  complement  sentence, 
however,  and  thus  (6h)  and  (6c)  are  acceptable.  It  is  not  clear  whether 
the  A-over-A  principle  could  be  extended  to  the  deletion  case  of  (8). 

Furthermore,  and  more  importantly,  Chomsky  (1968:46-47)  notes  that 
there  are  a number  of  cases  which  require  that  changes  in  the  A-over-A 
constraint  be  made,  and  cites  Ross'  evidence  (1967)  that  there  are  cases 
which  could  be  handled  by  the  A-over-A  constraint  only  with  ad  hoc  modi- 
fications. He  concludes  that  "perhaps  this  indicates  that  the  approach 
through  the  A-over-A  principle  is  incorrect,  leaving  us  for  the  moment 
with  only  a collection  of  constructions  in  which  extraction  is,  for  some 
reason.  Impossible."  We  believe  that  there  is  evidence  that  the  reason 
is  the  Freezing  Principle. 

Similarly,  Ross  (1967:243)  proposes  the  "Sentential  Subject  Constraint" 
to  account  for  the  failure  of  WH-FRONTING  and  other  movement  rules  to  apply 
to  a constituent  within  a sentential  subject: 

SSC:  "No  elemerl  uovninated  by  an  S may  be  moved  out  of  that 

S if  that  node  S is  dominated  by  an  NP  which  Itself  is 
immediately  dominated  by  S." 

As  we  will  show,  this  constraint  is  not  sufficiently  general  to  account 
for  the  entire  range  of  data  subsumed  by  the  Freezing  Principle. 

To  see  how  the  Freezing  Principle  predicts  these  data,  we  make  use 
of  Emonds'  (1970)  analysis,  in  which  (9b)  is  derived  from  (9a)  by  means 
of  a rule  of  SUBJECT  REPI ACEMENT . ^ 
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(9) 


(a)  S 

/\ 

NP  VP 

I /1\ 

it  V \ 


SUBJECT  REPLACEMENT 


Since  Sq  now  dominates  S^  VP  and  S S VP  is  not  a base  rule,  S^  is 
frozen.  Thus  no  element  of  S^  may  be  moved  and  thus  (7b)  and  (7c)  are 
ungrammatical. 

So  far,  looking  at  just  these  data,  on  the  purely  descriptive  level 
there  is  no  reason  to  prefer  either  the  Sentential  Subject  Constraint  or 
the  Freezing  Principle.  But  now  notice 

(10a)  It  is  obvious  g[that  John  is  going  to  need  some  help]. 

(10b)  *ls  g[that  John  is  going  to  need  some  help]  obvious? 

To  derive  (10b),  first  apply  SUBJECT-REPLACEMENT,  freezing  S,  and  then 
INVQISION.  The  Freezing  Principle  predicts  that  (lUb)  is  ungramma  .leal, 
since  the  structure  to  which  INVERSION  applies  in  (10b)  is  frozen.  The 
Sentential  Subject  Constraint,  however,  does  not  make  this  prediction. 

Ross  (1967:57)  accousts  for  (10b)  with  the  following  output  condi- 
tion: "Grammatical  sentences  containing  an  internal  NP  which  exhaustively 

dominates  S are  unacceptable".  Thus  Ross'  two  constraints,  which  we  have 
called  generalizations  from  the  data  (as  opposed  to  theoretical  propositions), 
are  accounted  for  nicely  by  the  Freezing  Principle.  We  would  say  that  these 
data  in  themselves  would  force  us  to  prefer  the  Freezing  Principle.  But  the 
situation  is  even  more  clear-cut,  for  there  are  related  data  which  none  of 
Ross'  principles  account  for,  but  which  are  predicted  by  the  Freezing 
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Principle.  These  ere 


8 


(lie)  How  obvious  is  it  ^[thet  John  is  going  to  need  sone  help]? 

(llb)  *How  obvious  is  glthet  John  is  going  to  need  sone  help]? 

(llc)  How  necessery  is  it  ^(for  John  to  leeve]? 

(lid)  *Bow  necessery  is  gCfor  John  to  leeve]? 

Once  egein,  SUBJECT-REPLACEMEMT  freezes  the  entire  sentence,  so  thet 
the  edjective  phrese  ney  not  be  noved,  eccordlng  to  the  Freezing  Principle. 
Since  nothing  hes  been  noved  out  of  the  pubject,  the  >ententiel  Subject 
Constreint  does  not  epply,  end  since  the  sententlel  coapleeents  in  (UJ>) 
end  (lid)  era  not  intemel,  Boos'  output  condition  does  not  epply.  Thus 
not  only  does  the  Freezing  Principle  predict  ell  the  dete  thet  Boss'  two 
constreints  pre-let,  Iwt  It  predicts  dete  thet  Boss'  constrelnts  cennot 
predict. 

Another  cese  Involves  the  trensfomatlon  which  derives  (12b)  fron 


(12e)  (cf.  Chonsky  1970  for  discussion). 

(1^)  John's  pictures 
(12b)  the  pictures  of  John's 

Alongside  (12b)  we  observe  the  construction  exssq>llfled  by  (12c). 

(12c)  the  pictures  of  John 

While  (12c)  corresponds  to  e possible  base  structure,  end  ney  in  feet 
be  e bese  generated  structure,  (12b)  is  derived  by  a trensfometlon  which 
clearly  causes  freezing.  Hence  the  Freezing  Principle  predicts  thet  It 
should  be  possible  to  question  the  <^ject  of  the  preposition  of  in  e con- 
struction like  (12c),  but  not  in  e construction  like  (12b).  This  predic- 
tion Is  correct,  as  the  exei^les  below  show. 
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(13a)  Mary  saw  the  pictures  of  who's  *Whose  did  Mary  se  the 

pictures  of? 

(13b)  Mary  saw  the  pictures  of  who  Who  did  Mary  see  the  pictures 

of? 

As  a last  case  consider  the  dative  construction  in  English.  As  we  show 
in  Culicover  and  Wexler  (1973) » after  the  DATIVE  transformation  has  applied, 
deriving  (14b)  from  (14a),  no  other  transformation,  such  as  WH-FRONTING,  for 
example,  can  apply  to  the  Indirect  object.  However,  these  transformations 

9 

can  apply  to  the  Indirect  object  if  DATIVE  has  not  applied. 

(14a)  John  gave  a book  to  Bill. 

(14b)  John  gave  Bill  a book. 

(14c)  What  did  Jou.:  give  to  Bill? 

(I4d)  Who  did  John  give  a book  to? 

(14e)  What  did  John  give  Bill? 

(14f)  *Who  did  John  give  a book? 

These  judgments  are  generally  accepted  in  the  literature,  but  have  resisted 

explanation.  Langendoen  (1973),  in  fact,  noting  that  the  data  cannot  be 
explained  by  rule  ordering,  suggests  two  special  ad  hoc  conditions  either 
of  which  could  explain  the  data  and  then  writes,  "Either  way,  the  solution 
seems  inelegant  and  ad  hoc,  and  one  is  led  to  question  the  grammatlcallty 
judgments  which  motivated  them  in  the  first  place".  Of  course,  if  it 
happens  too  often  that  the  Intractability  of  an  analysis  requires  judgments 
to  be  questioned,  then  the  entire  empirical  basis  of  linguistics  is  gone. 
Thus  it  is  Intriguing  that  the  Freezing  Principle  provides  a natural  solu- 
tion to  this  problem  with  no  change  at  all  in  the  data.  Assume  that  (14b) 

Is  derived  from  (14a)  as  in  (15). 
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(15)  (a)  s 

John  VP 

/\ 

I /\ 

gave  / \ 
Det  N 

I I 

I I 

a book 


(b) 


dative 

P 

/\ 

P NP 

i I 

to  Bill 


NP"  VP 


John 


(/ 

/\ 


Dec 


NP 

/\ 


NP 


N 


book 


gave  Bill 


Sl»c.  there  1.  heee  rule  of  the  for.  v . V »p.  the  upper  V nohe  fn  «5h) 
1.  froreo,  aod  thu.  WH-FRONII^  ceouor  «,e  the  hP  doe^net«i  b,  V euh  rhu. 

(14f)  1.  uodreoueclcal  bp  the  Preealo*  Principle.  Bur  alnc.  the  »P  a book 
is  not  frozen,  (14e)  ia  graimatlcal.^° 

But  apparently  there  1.  «,«  »ii.iaot”  variation  in  theae  Judgments. 
Hankamer  (1,73)  find,  aentence.  like  (Ue,  ungra«atical.  although  he 
otherwise  accepts  these  Judgments.  Ihat  is,  after  DATIVE.  Hankamer  cannot 
vueation  either  the  direct  or  indirect  object.^  .tote  that  exactly  thia 
pattern  of  grsmmaticality  Judgm«.t.  win  n.  ^ 

ia  changed  to  a VP,  as  in  (16). 

(16)  S 

NP 


John 


book 


Since  there  exiats  no  rule  in  the  base  of  the  form  VP  o VP  HP.  the  upper 
VP  in  (16,  Will  be  frozen  and  thus,  by  the  Preezing  Principle,  neither  the 
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indirect  object  nor  the  direct  object  nay  be  que&iloned,  thus  predicting 
this  second  pattern  of  judgoients. 

But  how  Is  a learner  to  choose  between  (ISb)  and  (16)?  If  (16)  were 

indeed  corre^w  (l.e.,  was  being  used  by  the  speakers  from  whom  lie  was 

learning  the  language),  and  If  the  learner  had  decided  on  an  analysis  of 

the  form  (ISb),  then.  If  there  Is  no  correction  of  ungramnu  tlcal  utterances, 

12 

the  learner  will  never  have  reason  to  change  his  analysis  . 

In  short,  the  data,  together  with  the  language  learning  procedure, 
might  not  determine  whether  (ISb)  or  (16)  Is  correct . There  might  be  a 
general  constraint  which  determines  that  when  Chomsky-adjuctlon  takes  place. 
Inserting  a node  between  X and  Y (with  X dominating  Y),  then  the  new  node 
Is  always  called  Y,  as  In  (ISb).  If  the  judgments  listed  in  (14)  are 
correct,  then  this  constraint  seems  reasonable.  If  the  mentioned  "dialect*' 
variations  actually  exist,  then  the  constraint  possibly  Is  not  correct, 

and  the  learner  may  be  free  to  choose  either  X or  Y as  the  name  for  the  new 

...»  1- 
node. 

Note  the  power  of  the  Freezing  Principle  here.  Although  It  allows 
both  sets  of  grammatlcallty  judgments.  It  does  not  allow  a third  set,  in 
which  one  could  move  the  indirect  object  after  DATIVE,  but  not  the  direct 
object,  that  is,  one  In  which  (14e)  Is  ungramnatlcal  and  (14f)  grammatical. 
This  Is  because  there  Is  no  way  of  stating  the  transformation  so  that  a 
node  dominating  the  direct  object  Is  frozen,  but  not  a node  dominating  the 
Indirect  object.  So  there  Is  a formal,  precise  prediction  that  this 
third  dialect  cannot  exist,  and  so  far  as  we  know  this  pattern  does 
not  exist  for  any  native  speaker. 
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C.  Rule-ordering 

We  have  also  found  that  there  la  considerable  reason  to  believe  that 
transformations  need  not  be  extrlnslcally  ordered  If  one  assumes  tl;.<^t  the 
Freezing  Principle  Is  a constraint  which  Is  operative  In  natural  language. 
It  should  be  evident  that  the  goal  of  dispensing  completely  with  extrinsic 
ordering  would  be  a desirable  one  to  attain,  provided  that  It  Is  consistent 
with  the  er:.plricax  evidence. 

To  consider  a ].  rtlcular  example,  let  us  return  to  sentences  involving 
extrapused  and  non-extraposed  sentential  complements.  It  turns  out  that  It 
1&  Impossible  to  delete  a that-complementlzer  If  the  complement  appears  in 
subject  position. 

(17a)  It  Is  obvious  jthat)  Mary  was  here  yesterday. 
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(17b)  I That  | Mary  was  here  yesterday  Is  obvious. 

I *»  1 

In  order  to  block  the  deletion  of  that  in  the  sententia.l  complex ?nt  one 
might  order  the  rule  of  THAT-DELETION  after  SUBJECT  REPLACEMENT.  Alterna- 
tively, If  one  wished  to  argue  that  the  rule  relating  (17a)  and  (17b)  was 
EXTRAPOSITION,  where  the  underlying  constituent  order  is  that  of  (17b),  then 
one  would  order  THAT-OELETION  after  EXTRAPOSITION.  Presumably  the  structural 
description  of  THAT-DELETION  would  be  stated  In  either  case  so  that  it 
could  not  apply  when  the  complement  was  In  subject  position. 

However,  observe  that  if  the  Freezing  Principle  Is  assumed,  then 
the  transformations  need  not  be  ordered  In  the  SUBJECT  REPLACQIENT  analysis. 
If  SUBJECT  REPLACEMENT  applies  first,  then  THAT- DFLETION  is  blocked  by  the 
frozen  structure.  If  THAT-DELETION  applies  first,  then  either  the  resulting 
structure  is  frozen,  or  else  the  resulting  structure  fails  to  meet  the 
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structural  description  of  SUBJECT  REPLACEMENT,  depending  on  Independent 
requlreoents  of  the  analysis.  On  the  other  hand.  It  can  be  seen  that  such 
an  explanation  Is  Impossible  In  terms  of  the  EXTRAPOSITION  analysis.  Hence 
the  Freezing  Principle,  for  this  body  of  data  at  least,  permits  us  to  dc 
without  extrinsic  rule  ordering,  and  Ir.  doing  so,  leads  to  an  unambiguous 
Interpretation  of  the  data. 

Another  example  involves  the  intera:tlon  between  DATIVE  and  COMPLEX  NP 
SHIFT  (noted  by  Ross  1967t53f£).  In  l“s  most  general  statement  COMPLEX  NP 
SHIFT  moves  an  NP  to  the  end  of  the  VP  which  dominates  It.  However,  this 
rule  cannot  apply  after  DATIVE  has  applied. 

(18a)  I gave  a book  about  spiders  to  the  man  in  the  park. 

(I8b)  I gave  to  the  man  In  the  park  a book  about  spiders. 

(19a)  I gave  the  man  in  the  park  a book  about  spiders. 

(19b)  *I  gave  a book  about  spiders  the  man  In  the  park. 

One  way  to  rule  out  (19b)  would  be  to  order  COMPLEX  NP  SHIFT  before  DATIVE. 
Application  of  COMPLEX  NP  SHIFT  would  then  destroy  the  environment  for  the 
latter  application  of  DATIVE.  However,  since  both  DATIVE  and  COMPLEX  NP 
SHIFT  cause  freezing  at  the  VP  which  dominates  the  two  objects,  the  appli- 
cation of  either  transformation  will  block  the  later  application  of  the 
ocher  if  the  Freezing  Principle  is  assumed.  Hence  it  will  be  unnecessary 
to  state  an  extrinsic  ordering  of  the  two  rules. 

Finally,  consider  Emonds'  (1970)  list  of  "root"  transformations  In 
English. 

Directional  adverb  preposing  EX;  Away  John  ran. 

Negated  constituent  preposing  EX:  Never  will  anyone  do  that. 

Direct  quote  preposing  EX:  "John  Is  a fink,"  Bill 

Non-factive  complement  preposing  EX:  John  is  a fink,  ^.il  ass.tmes. 


Beans  I hate. 
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Toplcallzatlon  EX:  Beans  I hate. 

VP  Preposing  EX:  John  said  I would  like  her,  and  like  her  I do. 

Left  dislocation  EX:  John,  he  really  plays  the  guitar  well. 

Comparative  substitution  EX:  Harder  to  fix  would  be  the  faucet. 

Participle  preposing  EX:  Standing  in  the  doorway  was  a witch. 

PP  substitution  EX:  In  the  doorway  stC‘'!d  a witch. 

As  Qoonds  points  out,  only  one  of  these  transfoimations  may  apply  in  any 
derivation.  This  condition  follows  as  a consequence  of  the  Freezing 
I^.inciple,  if  one  makes  the  reasonable  assumption  that  each  of  these  trans- 
formations causes  freezing  at  the  S-node  to  which  it  applies.  Observe  that 
in  this  case  it  is  simply  Impossible  to  find  an  extrinsic  ordering  of  all  of 
the  rules  mentioned  which  will  account  for  the  fact  that  only  one  of  them  may 
apply  at  a given  S.  Hence  not  only  does  the  Freezing  Principle  permit  us  to 
do  away  with  a number  of  cases  where  extrinsic  ordering  would  otherwise  be 
required,  but  it  accounts  for  a situation  in  which  rule  ordering  alone  is 
not  adequate  to  account  for  the  data. 


IV.  Semantics 

A.  The  Invariance  Principle 

The  role  of  semantics  in  the  linguistic  system  must  be  analyzed 
carefully,  because,  in  addition  to  the  necessity  of  providing  an  adequate 
descriptive  semantics,  we  must  understand  how  meaning  helps  to  provide 
structural  information  to  the  language  learner.  As  a first  step  we  assumed 
the  Universal  Base  Hypothesis,  which  says  that  there  is  one  syntactic  base 
for  all  languages.  But,  of  course,  since  languages  have  different  syntactic 
deep  structures  (e.g.,  all  languages  are  not  SVO),  this  assumption  must  be 
modified.  In  Wexler  and  Cullcover  (1974)  we  modify  this  assumption  along 
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lines  which  have  been  previously  suggested.  We  assume  that  tl»re  is  a 
"semantic"  structure  which  is  hierarchical  but  not  ordered  from  left  to 
right,  and  we  as.'iume  that  this  structure  is  related  to  the  syntactic  deep 
structure  in  a very  constrained  way:  the  hierarchical  relations  in  the 

semantic  representation  are  retained  in  the  syntactic  deep  structure, 
although  any  left-to-rlght  order,  given  this  constraint,  is  acceptable. 

This  constraint  is  called  the  Invariance  Principle,  because  the  grammatical 
rclar Auns  are  assumed  to  be  Invariant  from  semantic  to  syntactic  structure. 
As  an  artificial  example,  suppose  the  semantic  representation  has  the 
unordered  structure  in  (20a).  Tnen  any  four  of  the  ordered  deep  structures 
in  (20b)  are  possible,  by  “he  Invariance  Principle. 


(20a) 


A 


/ \ 


We  also  assuce  that  the  "semantic  grammar"  is  universal,  but  that 
natural  languages  differ  in  which  ordered  deep  structure  they  have.  All 
of  these  deep  structures  are  related,  however,  by  the  Invariance 
Principle.  This  is  a very  strong  assumption,  and  has  the  virtue  that  it 
allows  the  deep  structures  of  a language  to  be  learned  by  a fairly  simple 
learning  procedure.  But  although  this  is  such  a strong  assumption,  there 
is  considerable  evidence  for  it.  This  evidence  is  p.. '.sented  in  Culicover 
and  Wexler  (1974b),  where  data  from  218  languages  is  considered. 
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The  evidence  takes  the  form  of  predictions  about  unlversals  of  vord 
order.  For  example,  suppose  the  universal  unordered  semantic  representa- 
tion for  the  Noun  Phrase  Is 


(21) 


There  la  evidence  that  the  ordered  form  of  this  structure  as  shown  in  (21)  is 
correct  for  English.  Then,  the  Invariance  Principle  predicts  that  only  eight 
deep  structure  orders  are  possible  for  the  four  categories  Det,  Num,  AdJ,  N ; 
namely  those  obtained  by  permuting  each  branch  of  the  structure.  Thus  the 
possible  orders  are  Det  Num  AdJ  N , Num  AdJ  N Det,  Det  AdJ  N Num, 

AdJ  N Num  Det,  Det  Num  N AdJ,  Num  N AdJ  Det,  Det  N AdJ  Num,  axtd  N AdJ  Num  Det. 

Without  constraints,  of  course,  there  are  41  ■ 24  orders  of  the  four 
categories  available.  Therefore  the  prediction  that  only  8 are  possible 
is  a strong  prediction.  In  Cullcover  and  Wexler  (1974)  we  find  that,  of 
all  the  languages  for  which  adequate  data  is  available,  there  Is  only  one 
exception  to  this  prediction,  that  is,  only  one  order  of  these  constituents 
which  is  not  in  the  eight  predicted  ones.^'’  All  the  other  languages  have 
an  order  which  is  one  of  the  eight  predicted  ones. 

Thus  note  that  the  Invariance  Principle  together  with  the  assumed  uni- 
versal semantic  representation  makes  very  strong  predictions  which  can  be 
confirmed.  In  Cullcover  and  Wexler  (1974)  we  also  confirm  the  predictions 
for  a number  of  other  structures. 
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All  of  this  evidence  is  used  to  support  both  the  Invarlsnce  Principle 
and  the  assumed  universal  semantic  representation,  which  Is  hierarchically 
structured  (l.e..  It  Is  like,  though  in  d^fall  dlffer^t  from,  an 
uxu>rdcred  version  of  traditional  context-free  deep  structures  for  English). 
There  have  been  a number  of  other  proposals  in  the  literature  for  the  form 
of  the  "semantic  base",  most  of  them  being  more  similar  to  a version  of  the 
predicate  calculus  notation  (e.g.,  Lakoff  1970a)  or  a case  system  (e.g., 
Fillmore  19>i8).  It  Is  Important  to  note  that  none  of  these  proposals  can 
satisfy  the  Invariance  Principle,  and  that,  so  far  as  ve  can  see,  they  cannot 
(without  numerous  ad  hoc  assumptions)  make  the  strong  predictions  about 
unlversals  of  word  order  In  Cullcover  and  Wbxler  (1974).  Thus  we  have 
evidence  that  the  traditional  structured  deep  structure  Is  correct. 

To  take  another  example,  note  that  the  Invariance  Principle,  together 
with  the  assumption  that  the  semantic  granaar  rewrites  S as  NP-VP, 
where  the  VP  is  expandable  as  either  V or  V-NP,  predicts  that  If  the  subject 
of  a sentence  precedes  the  V In  a transitive  sentence  then  the  subject  must 
precede  the  V in  an  intransitive  sentence.  Once  again  our  data  completely 
confirm  this  prediction,  and  there  Is  no  non-ad  hoc  way  for  the  predicate 
calculus  formulations  to  predict  these  phenomena. 

The  kind  of  counter-example  to  these  claims  that  might  occur  to  the 
linguist  Is  a so-called  "subjectless"  language.  In  which.  It  has  been 
argued,  there  is  no  deep  subject-predicate  structure.  But  the  existence 
of  these  languages  has.  It  seems  to  us,  not  been  at  all  demonstrated.  In 
Cullcover  and  Wexler  (1974)  we  analyze  Kapaapangan,  a language  which  It  Is 
claimed  Is  subjectless,  and  show  that  an  analysis  \dtlch  assumes  an  underlying 
subject-predicate  division  accounts  more  readily  for  a n'imber  of  Interesting 
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gramaaclcal  pheuomena  in  the  language  than  does  a "subjectless"  analysis,  for 
example,  Mlrlkltanl's  (1972). 

Thus  there  is  evidence  that  the  Invariance  Principle  Is  correct.  It 
Is  also  true  that,  given  the  constraints  Imposed  by  the  Invariance  Principle, 
the  (ordered)  deep  structure  rules  are  quite  easily  learnable  (Wexler  and 
Culicover  1974),  which,  of  course,  is  a goal  of  the  analysis. 

fi.  Semantic  adequacy 

There  is  one  ocher  very  Important  kind  of  analysis  which  must  be  made 
to  justify  the  system,  and  this  is  to  provide  evidence  that  the  semantic 
structures  which  Che  Freezing  Principle  and  Invariance  Principle  force  us  to 
assume  are  in  fact  descriptively  adequate. 

Application  of  the  Freezing  Principle  places  very  strong  restrictions 
on  what  the  deep  structure  configuration  of  a sentence  may  be  given  the 
appropriate  kinds  of  lufomatlon  about  what  the  transformational  mapping 
between  the  deep  structure  and  Che  surface  structure  must  account  for. 

Hence  the  asLumpClon  Chat  hierarchical  arcauBcments  in  deep  structures 
and  semantic  .structures  are  preserved  by  the  mapping  between  them  (the 
Invariance  Principle)  together  with  the  predictions  about  deep  structures 
made  by  the  Freezing  Principle  serve  to  make  quite  explicit  predictions 
about  the  nature  of  semantic  structures.  It  is  m pessary  to  show  that 
Che  theory  sketched  out  above  is  in  fact  explanatorily  adequate,  in  that 
it  leads  directly  to  a descriptively  adequate  semantic  account.  In  other 
words,  we  wish  to  shew  that  Che  semintlc  structures  which  we  arrive  at  are 
the  correct  ones  In  terns  of  Che  Interpretations  assigned  to  them  hy  the 
semantic  component.  Our  results  in  this  area  are  somewhat  tentative,  so 
we  must  restrict  our  remarks  here  to  a alscussion  of  Che  direction  in 
which  such  an  investigation  might  lead. 
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1.  The  extenslonality  of  the  subject. 

Let  us  say,  following  a traditional  terminology  of  modeixt  logic,  that 
the  extension  of  an  expression  is  its  reference,  where  the  extension  of  a 
sentence  is  either  truth  or  fal3ity  depending  on  whether  the  sentence  is 
true  or  false.  Let  us  also  say  that  the  intension  of  an  expression  is  a 
function  defined  in  the  semantic  component  which  assigns  to  jach  expression 
its  extension  if  it  has  one. 

An  opaque  context  is  one  in  which  a sub-expression  of  an  expression 
need  not  have  an  extension  in  order  for  the  entire  expression  to  have  an 
extension.  One  such  example  is  (22). 

(22)  John  is  looking  for  a unicorn. 

(22)  may  be  true  or  false  even  if  there  is  no  such  thing  as  a unicorn. 

There  is  a second  reading,  of  course,  in  which  a unicorn  must  exist. 

Montague  (1973)  represents  this  ambiguity  of  an  expression  such  as 
(22)  in  the  following  way.  In  the  sjmtactlc  derivation  of  the  sentence 
the  direct  object  of  the  verb  is  looking  for  may  be  either  the  intension 
of  a unicorn,  which  we  may  represent  here  informally  as  a unicorn* , or 
the  object  of  the  verb  mf.y  be  a variable  expression  he.  , whose  Intension 
may  be  represented  informally  as  he . ' . In  the  latter  case  the  surface 
structure  of  the  sentence  is  derived  by  replacing  the  expression  he  by 
the  expression  a unicorn.  Thus  the  sentence  is  syntactically  as  well  as 
semantically  .rnbiguous,  by  virtue  of  the  feet  that  it  has  two  derivations. 
(In  fact  it  has  several  more  which  do  not  lead  to  further  semantic  ambi- 
guity.) Associated  with  the  two  derivations  are  different  rules  of  seman- 


tic interpretation,  so  that  the  semantic  structure  assou.;t3d  with  the 
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senteace  is  different  depending  on  the  syntactic  rules  which  participate 
in  the  derivation.  The  two  syntactic  derivations  are  given  informally  as 
(23a)  and  (23b)  respectively,  while  the  corresponding  semantic  representa- 
tions are  given  informally  as  (24a)  and  (24b)  respectively. 


(23a) 


John  is  looking  for  a unicorn 


is  looking^^^r^^a^i^^ 
is  looking  for  a unicorn 


(23b) 


John  is  looking  for  a unicorn 


a unicOrn 


John  is  locking  for  he 


is  looking  for  he. 


is  looking  for 


(24a) 


John*  (is  looking  for*  (a  unicorn* ) ) 


(24d) 


Jx  (unicorn* (x)  & (John*  (is  looking  for*  (x))))  . 


In  essence,  the  device  of  introducing  a noun  phrase  in  the  syntactic  deri- 
vation outside  of  the  context  of  the  verb  is  looking  for  permits  Montague 
to  maintain  in  principle  the  semantic  ambiguity  by  keeping  the  translation 
into  the  semantic  representation  of  a unicorn  within  the  context  of  the 
verb  In  the  first  aso,  and  outside  of  it  in  the  second  case. 

In  fact,  however,  most  verbs  do  not  possess  this  property  of  permitting 
their  direct  object  to  be  intenslonal.  In  a case  where  there  is  a non- 
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intensional  verb,  such  as  hit . or  saw,  Montague  applies  a meaning  postulate 
which  "maps"  the  semantic  representation  of  the  form  (34a)  into  the  semantic 
representation  of  the  form  (24b).  This  rule  is  inapplicable  Just  in  case 
the  verb  is  one  like  is  looking  for. 

It  is  clear  that  this  is  not  a logically  necessai.'y  analysis  of  the 
data.  It  is  certainly  possible  to  imagine  an  alternative  foimulation,  in 
which  there  is  only  one  syntactic  derivation  of  the  simple  sentence,  and  in 
which  there  is  a semantic  rule  which  obligatorily  derives  semantic  repre- 
sentations such  as  (24b)  from  those  like  (24a) , except  when  the  verb  is 
of  the  type  is  looking  for,  in  which  case  the  rule  applies  optionally. 

Application  of  the  Invariance  Principle  leads  us  to  favor  the  second 
alternative.  There  is  no  syntactic  evidence  to  suggest  that  a possible 
deep  structural  analysis  of  (22)  is  that  given  in  (25)  below. 


(25) 


is  looking  for  it 


If  this  is  the  correct  analysis  of  the  syntactic  data,  as  we  believe  it  is, 
the  Invariance  Principle  will  not  in  itself  lead  us  to  two  semantic  repre- 
sentations for  a sentence  such  as  (22).  It  is  worth  asking,  therefore, 
whether  there  is  any  evidence  that  the  second  alternative  formulation  of 
the  ambiguity  of  (22)  is  in  principle  the  correct  one. 

It  is  Important  to  point  out  that  in  Montague's  analysis  the  first 
level  of  semantic  representation  is  one  in  which  all  noun  phrases  are 
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translated  Into  their  corresponding  Intenslonal  expressions.  As  Montague 
correctly  points  out,  there  are  no  verbs  such  that  the  subjects  of  such 
verbs  may  not  be  further  translated  Into  extenslonal  expressions.  We  have 
already  seen  that  there  are  verbs  whoje  objects  may  not  be  so  translated, 
however.  Consequently  Montague  Is  forced  to  state  two  rules,  one  of  which 
extenslonallzes  the  direct  objects  of  non-lntenslonal  verbs  (such  as  hit . 
see,  etc.)  and  the  other  of  which  extenslonallzes  the  subjects  of  all  verbs. 
This  fc  lulatlon,  as  can  be  seen.  Is  ad  hoc  In  that  It  provides  no 
explanation  for  why  It  should  be  that  subjects  are  always  extenslonal  but 
objects  r.re  not. 

Furthermore,  Montague  uses  a device  of  reducing  the  primary  semantic 
representations  to  representations  of  the  form  of  the  predicate  calculus 
with  a function  (argument,  argument,...)  structure.  Hence  he  finds  It 
necessary  to  then  state  rules  of  extenslonality  for  expressions  with  one 
argument,  another  for  expressions  with  two  arguments,  and  he  would  have 
presumably  had  to  state  one  rule  for  expressions  with  three  arguments, 
another  rule  for  expressions  with  four  arguments,  and  so  on,  had  he 
extended  his  analysis  to  more  complex  types  of  expressions.  The  crucial 
infelicity  of  such  an  approach  is  that  It  falls  to  explain  why  it  should 
be  that  the  subject  is  always  extenslonal  regardless  of  the  form  of  the 
expression.  While  it  is  certainly  possible  to  exprcoS  this  fact  within 
Montague's  framework.  It  does  not  follow  as  a necessary  consequence  of 
the  analysis. 

A notable  characteristic  of  Montague's  approach  to  the  translation  of 
expressions  with  syntactic  structure  into  semantic  representations  is  that 
the  basic  structure  of  the  expression  is  preserved  in  the  primary  semantic 
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representation.  The  mapping  In  his  framework  thereJore  conforms  to  the 
Invariance  Principle.  Furthermore,  the  syntactic  structure  is  one  which 
displays  the  subject/predicate  split,  and  this  split  is  therefore  preserved 
in  the  primary  semantic  representation.  It  is  only  .it  a secondary  level 
that  Montague  reduces  the  semantic  representation  to  an  expression  which 
closely  conforms  to  the  type  of  representation  traditionally  employed  in 
the  predicate  calculus.  Tt  seems  to  us,  however,  that  it  is  not  logically 
necessary  to  perform  this  reduction  of  structure  in  a semantic  component 
whose  goal  is  to  provide  a precise  characterization  of  the  notion  of  truth. 
That  such  a reduction  may  even  be  wrong  is  shown  by  the  fact  that  it 
destroys  the  structure  which  might  otherwise  serve  to  contribute  to  a 
precise  and  general  characterization  of  opaque  contexts. 

A first  approximation  to  a solution  of  the  problem  would  be  the 
following:  First,  formulate  an  hypothesis  about  what  constitutes  an  opaque 

context  in  terms  of  the  structure  in  which  the  element  which  creates  this 
context  participates.  Second,  state  a semantic  rule  which  is  sensitive 
to  the  presence  of  an  opaque  context  which  will  account  for  the  ambiguity 
of  an  expression  which  contains  one  at  the  semantic  level.  Third,  show 
that  this  definition  is  extendible  to  a wide  variety  of  expressions, 
and  that  it  can  be  used  as  a diagnostic  for  semantic  structure.  Fourth, 
show  that  the  semantic  structures  arrived  at  in  this  way  are  appropriately 
related  by  the  Invariance  Principle  to  the  syntactic  structures  arrived  at 
by  independent  application  of  the  Freezing  Principle  to  the  transforma- 
tional component . 

2.  Definition  of  an  opaque  context. 


Let  us  return  to  example  (22). 
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(22)  John  Is  looking  for  a unicorn. 


Ue  assume  that  the  syntactic  structure  of  (22) , and  hence  Its  semantic 
structure  exclusive  of  constituent  order.  Is  as  In  (26) . 


(26) 


NP 

I 

John 


PRED 


AUX 
I 


'VP 


Pres  be  ing  V' 

look  for 


"NP 
a unicorn 


Let  us  refer  to  expressions  such  as  loo't  for  as  opacity  causing  elements, 
or  OCE's.  What  properties  of  the  structure  will  permit  us  to  distinguish 
between  the  subexpressions  which  are  within  the  context  of  an  OCE,  and 
those  which  are  not? 

The  property  which  we  would  like  to  suggest  Is  that  of  In  construction 
with.  Kllma  (1964)  defines  In  construction  with  as  follows  (p.  297), 
rephrased  slightly: 


Definition:  A constituent  A Is  In  construction  with  a 

constituent  ^ If  ^ Is  dominated  by  the  first 
branching  node  which  dominates  B,  and  B does 
not  dominate  A. 

For  the  sake  of  clarity  we  will  say  that  if  A is  in  construction  with  B, 
then  B governs  A.  To  Illustrate,  in  (27)  below  A governs  JB,  C and  ^ and 
B is  governed  only  by  A.  and  govern  one  another. 
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(27) 


Returning  to  (26),  now,  we  find  that  governs  serves  to  distinguish 
between  the  NP  John  and  the  NP  a unicorn  In  terms  of  their  structural 
relationship  with  the  OCE  look  for . The  former,  which  is  outside  of  the 
opaque  context,  is  not  governed  by  the  V look  for,  while  the  latter, 
which  is  inside  of  the  opaq’je  context,  is  governed  by  look  for.  On  the 
basis  of  this  observation  we  may  formulate  the  following  definition  of 
what  constitutes  an  opaque  context. 


Definition;  an  expression  £ is  in  an  opaque  context  with 
respect  to  an  opacity  causing  element  £ if 
^ governs  E. 

It  turns  out  that  if  a constituent  ^ is  governed  by  a constituent  ^ then 
every  constituent  which  A dominates  is  also  governed  by  If  the  definition 
of  an  opaque  context  given  above  is  correct,  then,  we  would  expect  that  any 
constituent  of  a constituent  in  an  opaque  context  is  also  within  an  opaque 

context.  This  prediction  is  verified  by  examples  such  as  the  following: 

(28a)  John  is  looking  for  a unicorn  with  two  horns. 

(28b)  John  is  looking  for  a '.micom  with  two  horns  that  have 

blue  and  green  polka  dots  on  them. 

(28c)  John  is  looking  for  a unicorn  that  can  ride  a bicycle. 


As  can  be  seen,  not  only  is  it  the  case  that  the  unicorns  defined  in  me 
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examples  in  (28)  need  not  exist  In  order  for  the  expressions  to  be  true, 
but  neither  uo  horns,  horns  with  blue  and  green  polka  dots  on  them,  blue 
and  green  polka  t^ots,  or  bicycles  have  to  exist  In  order  for  these  ex- 
pressions to  be  true.  Since  It  Is  well-known  that  prepositional  phrases 
and  relative  clauses  such  as  those  found  In  the  examples  in  (28)  are  consti- 
tuents of  the  NP's  which  they  modify,  these  observations  serve  to  verify 
to  some  extent  the  prediction  made  by  the  definition  of  opaque  context 
which  we  have  formulated  above. 

One  further  example  will  show  how  syntactic  and  semantic  evidence 
converge  to  require  the  same  analysis.  In  Section  III  we  showed  how  the 
Freezing  Principle  explains  many  previously  anomalous  facts  about  the 
DATIVE  transformation.  In  order  to  explain  these  facts,  a structure  had 
to  be  taken  as  basic  which  Included  the  prepositional  phrase,  and  the 
other  structure  had  to  be  derived  from  that.  Thus  (29b)  must  be  derived 
from  (29a) , and  not  vice-versa.  In  order  for  the  Freezing  Principle  to 
correctly  predict  the  phenomena. 

(29a)  John  promised  a book  to  a womatn. 

(29b)  John  promised  a woman  a book. 

The  structure  underlying  (29a)  is  (30). 


promised 
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But  the  semantic  evidence  supports  this  analysis  also.  Since  promise 
is  an  OCE,  we  predict  that  a referent  need  not  exist  for  an  NP  which  it 
governs.  Thus,  assuming  structure  (30),  the  referent,  a book  need  not  exist. 
On  the  other  hand,  since  promise  does  not  govern  a woman  in  (30),  the 
referent  of  a woman  must  exist.  These  predictions  are  correct.  In 
other  words,  (29)  is  two-ways  ambiguous,  the  ambiguity  depending  on  whether 
or  not  a certain  book  had  been  promised. 

But  if  (29b)  were  taken  as  basic,  then  these  predictions  would  not 
be  made.  Presumably  both  NP's  (a  book,  a woman)  would  be  in  construc- 
tion with  promise  (in  a "double  object"  construction)  and  the  prediction 
would  be  that  (29a, b)  were  four  ways  ambiguous,  which  is  not  the  case. 

Thus  syntactic  and  semantic  evidence,  of  very  different  kinds,  converge 
on  one  analysis  and  lend  credence  to  the  joint  assumptions. 

V.  Language  Acquisition  Data 

As  we  noted  at  the  beginning  of  this  article,  the  empirical  basis  for 
the  justification  of  our  theory  lies,  for  the  moment,  in  linguistic  data, 
rather  than  in  the  data  of  child  speech.  Our  approach  is  different  from 
the  one  usually  adopted  in  the  study  of  language  acquisition,  which  is  to 
study  the  language  of  chilaren  who  have  not  yet  acquired  adult  competence. 
The  two  approaches  should  be  seen  as  complementary.  Ultimately,  of  course, 
we  hope  that  a more  direct  empirical  justification  could  be  found  for  our 
theory  in  data  concerning  child  language.  At  the  moment,  however,  we  must 
be  content  with  a situation  not  unheard  of  in  science,  in  which  indirect 
justification  is  all  that  is  available. 

Let  us,  however,  consider  ways  in  which  our  theory  might  make  con- 
tact with  empirical  data  concerning  child  language.  Logically,  there  seem 
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to  be  two  ways  In  which  this  can  happen.  First,  It  might  be  possible  to 
test  empirically  various  of  the  assumptions  of  the  theory.  Secondly,  the 
theory  might  make  predictions  about  the  course  of  language  acquisition 
which  could  be  tested. 

With  respect  to  testing  the  assumptions  of  the  theory,  some  of  this 
has  already  been  accomplished.  For  example,  we  assume  that  the  child  Is 
not  corrected  for  ungrammatical  sentences,  and,  as  we  mentioned  earlier, 
this  seems  to  be  an  empirical  result  (Brown  and  Hanlon  1970) . Other 
assumptions  have  not  bean  tested  so  directly.  For  example,  we  assume  that 
the  child  hears  sentences  In  situations  which  are  clear  enou^  for  him  to 
be  able  to  Interpret  the  meaning  without  understanding  the  sentence. 

Although  so  far  as  we  know  this  assumption  has  not  been  directly  tested. 

It  is  certainly  consistent  with  empirical  results  (e.g.,  Ervin-Tripp  1971, 
Snow  1972)  which  show  that  children  are  spoken  to  simply  (the  assumption 
being  that,  all  other  things  being  equal,  the  meaning  of  simple  sentences 
Is  easier  to  determine  from  the  situation).  The  fact  that  our  theory  (with 
the  Freezing  Principle)  allows  transformations  to  be  .’earned  from  relatively 
.ilmple  sentences  Is  also  consistent  with  the  simplicity  of  Input  to  the 
child. 

The  second  way  in  which  the  data  of  language  acquisition  might  be  rele- 
vant to  our  studies  Is  that  our  theory  might  make  testable  predictions  about 
the  course  of  language  acquisition.  For  example,  the  combination  of  our 
assumptions  about  language  and  the  learning  procedure  might  make  predictions 
about  which  transformations  developed  first.  This  Is  a very  subtle  question 
however.  The  problem  is  that  there  are  so  many  ways  of  changing  parameters 
(e.g.,  the  order  of  input,  the  weighing  of  hypotheses,  the  pragmatic 
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Importance  of  various  transformations)  that  there  may  be  no  unique  or  small 
collection  of  possible  orders  of  development  predicted  by  the  theory.  And, 
with  respect  to  transformations  it  may  be  that  the  order  nf  development 
differs  from  child  to  child.  Another  Important  difficulty  with  respect  to 
making  these  kinds  of  predictions  Is  that  performance  considerations  (e.g., 
problems  of  short-term  memory  and  the  actual  sentence  generation  mechanism 
used  by  the  child,  what  Watt  (1970)  calls  the  development  of  the  "abstract 
performative  grammar")  might  have  large  effects  on  children's  utterances, 
as  might  aspects  of  cognitive  development. 

However,  more  subtle  kinds  of  predictions  might  be  made.  For  example. 
It  is  a well-known  observation  (Bellugi-Kllma  1968)  that  children  some- 
times learn  a transformation  and  use  it  correctly  when  no  other  transforma- 
tion is  involved  in  the  sentence,  but  when  another  transformation  Is  needed, 
both  cannot  be  used  together,  and  one  is  not  applied.  An  example  is 
INVERSION  and  WH-FRONTING.  Thus  a child  might  say  "Is  your  name  Bill?" 
thus  demonstrating  INVERSION,  but  also  say  "what  your  name  is?’’  thus  not 
using  INVERSION  when  WH-FRONTING  Is  necessary.  The  suggested  explanation 
of  these  observations  is  that  there  Is  a performance  limitation  on  the 
child;  namely  he  can  use  only  one  transformation  at  a time.  However,  it 
may  be  that  the  Freezing  Principle  can  play  a role  in  the  explanation  of 
these  phenomena.  The  child's  grammar  may  be  such  that  one  of  these  trans- 
formations causes  freezing  and  blocks  the  other  one.  Thus  both  transforma- 
tions cannot  apply  together.  This,  of  course,  is  not  true  of  the  adult 
grammar,  but  the  child  must  learn  the  appropriate  statement  of  each  trans- 
formation. There  is  considerable  room  for  error,  even  if  he  assigns  the 
surface  string  correctly  in  some  cases. 
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We  vlsh  to  emphasize  chat  the  above  suggestion  is  only  speculative, 
and  that  very  much  analysis  of  the  child's  grammar  would  have  to  be  under- 
taken to  make  It  a reasonable  hypothesis.  In  particular,  one  would  have 
to  find  ways  to  tease  it  apart  from  the  performance  "one  transformation 
at  a time"  hypothesis.  It  is  only  mentioned  to  indicate  the  possibility 
of  the  interaction  of  the  syntactic  portions  of  the  theory  with  the  data 
of  language  acquisition. 

Another  example  of  how  the  theory  can  be  used  to  make  predictions 
about  the  data  of  child  language  acquisition  is  provided  by  the  problem 
of  word  order  in  early  child  language.  There  is  some  difficulty  in  finding 
relevant  data  because  it  is  possible  that  the  development  of  the  base  gram- 
mar (i.e. , the  order  of  the  elements  that  define  grammatical  relations)  is 
very  fast,  at  least  for  the  major  categories.  Thus  one  would  have  to 
observe  the  child  quite  early  in  his  linguistic  development,  rxght  from 
the  start  of  the  two-word  stage,  in  order  to  capture  data  relevant  to  the 
predictions.  In  fact,  it  is  entirely  consistent  with  the  theory  for  the 
child  to  make  no  production  errors  at  all  with  respect  to  the  order  of  the 
deep  structure  constitutents,  since  the  procedure  which  learns  this  order 
is  quite  simple  and  straight-forward.  In  contrast  with  the  procedure  which 
learns  transformations,  this  procedure  converges  very  quickly,  and  it  is 
quite  conceivable  that  convergence  has  taken  place  before  the  child  starts 
to  actually  produce  two-word  utterances.  So  we  require  very  subtle  ways  of 
finding  those  few  errors  that  do  occur. 

The  base  grammar  that  children  develop  will-  of  course,  depend  on  the 
base  of  the  language  that  they  are  learning.  But  since  many  of  the 
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sentences  the  child  hears  Involve  transformation!? , there  Is  no  reason  ' -j 
suppose  that  necessarily  all  children  learning  a given  language  will  pass 
through  exactly  the  same  stages.  In  particular,  a given  learner  might,  at 
some  stage,  posit  an  Incorrect  base  grammar.  However,  If  the  learner  Is 
obeying  the  constraints  that  we  have  proposed,  namely  the  Invariance 
Principle,  then  we  can  formally  predict  that  there  are  certain  patterns 
that  should  never  be  observed.  In  particular,  all  the  unlversals  which  we 
have  predicted  for  the  base  grammar  of  any  language  (Cullcover  and  Wexler 
197Ab)  should  hold  for  a given  stage  of  one  language  learner. 

For  example,  we  predict  that  no  language  would  have  (as  deep  struc- 
tures) VSO  order  for  transitive  sentences  Md  SV  order  for  Intransl- 
tlves.  Thus  we  predict  that,  since  he  Is  forming  his  grammars  under  the 
constraint  of  the  Invariance  Principle,  no  child  will  simultaneously  have 
these  orders  for  deep  structures.  (It  Is  possible  that  at  one  time  a child 
has  SVO  and  SV  and  at  a later  time  VSO  and  VS) . 

One  can  test  this  prediction  by  looking  at  reports  of  children's 
utterances.  Keman  (1969,  1970)  has  found  that.  In  the  two-word  stage,  a 
Samoan  child  has  VS  and  VO  orders  (Keman  actually  uses  a case  grammar 
description,  but  for  these  purposes  this  can  be  modified).  Thus  In  three 
word  sentences  we  would  predict  either  VSO  or  VOS.  In  fact,  the  one  three 
word  utterance  the  child  makes  Is  VOS.  Thus  the  prediction  Is  confirmed. 

Another  more  Interesting  case  Is  Gla  I In  Bloom  (1970).  Gia  at  this 
(early  two-word)  stage  made  (according  to  Blooiu'b  criteria)  3 utterances 
with  a subject  and  a verb.  They  were  "girl  write"  (In  response  to  the 
question  "VTnat's  the  little  girl  doing?")  and  two  Instances  of  "Mommy  back". 
The  fact  that  In  these  Intransitive  verb  cases  the  subject  comes  first 
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(i.e.,  they  are  SV)  predicts,  according  to  the  Invariance  Principle, 
that  the  subject  will  come  b;^£ore  the  verb  In  transitive  sentences.  The 
only  other  utterances  with  verbs  that  Gla  makes  at  this  stage  are  3 utter- 
ances of  the  form  OV  (object  verb),  for  example  "balloon  throw".  Thus 
we  know  that  0 comes  before  V.  Now,  In  N plus  N constructions  (presumably 
the  V has  been  left  out),  Gla  always  puts  the  S before  the  0,  that  Is  SO. 
Thus  since  0 comes  before  V and  S comes  before  0,  we  know  that  S comes  be- 
fore V in  transitive  sentences,  which  is  the  prediction  made  from  the 
Invariance  Principle  given  the  data  chat  SV  was  the  order  in  Intransltlves. 
Thus  Gla's  order  Is  consistent  with  the  predictions  made  b>  the  Invariance 
Principle. 

VI.  Summary 

In  Section  I'^  we  considared  th^.  nature  of  the  constraints  which 
notions  of  leamablllty  Impose  on  the  class  of  possible  human  languages, 
and  on  the  nature  of  the  human  language  learning  mechanism.  Section  III 
dealt  with  some  linguistic  evidence  to  support  the  universals  of  syntax 
which  emerge  from  the  leamablllty  studies,  namely  the  Freezing  Principle 
and  the  Binary  Principle.  In  Section  IV  we  discussed  some  theoretical 
and  empirical  work  in  semantics. 

The  significance  of  semantic  considerations  rests  on  two  crucial 
aspects  of  the  theory:  first,  our  theory  of  language  acquisition  utilizes 

semantics  as  a crucial  component  cf  information  for  the  language  learner; 
second,  any  theory  of  syntax  must  provide  structures  which  are  consist ent 
with  a theory’  of  semantic  Interpretation. 

In  Section  IV  it  was  also  shown  how  the  Universal  Base  Hypothesis 
may  be  replaced  by  a less  restrictive  hypothesis  called  the  Invariance 
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Prln'-iple,  which  relates  syntactic  and  semantic  structures.  Given  the 
Invariance  Principle  the  base  component  of  the  grammar  is  learned  by  a 
simple  learning  procedure.  In  addition,  we  discussed  briefly  the  notion 
that  the  Invariant  i clnciple  and  the  Freezing  Principle  taken  together 
make  a number  very  strong,  anc  we  believe  correct,  predictions  concern- 
ing tir.lversals  of  constituent  order  in  human  language. 

In  Section  V we  considered  how  various  kinds  of  techniques  used  in 
developmental  psycholinguistics  may  be  eu  to  find  empirical  evidence 
relevant  to  the  learning  theory.  We  also  discussed  several  examples 
which  may  prove  to  be  fruitrul  upon  further  close  examination. 

Thus,  the  work  reported  un  here  represents  research  towards  the 
follc^lng  objectives: 

1.  the  specification  of  a theory  of  grammar  of  human 
language,  Insofar  as  it  is  characterizable  in  terms  of 
formal  linguistic  structural  universals; 

2.  the  precise  specification  of  a psychologically  plaus- 
ible theory  of  the  language  learner; 

3.  the  formal  demonstration  that  the  device  specified  in 
2 above  learns  the  grammar  of  any  possible  language 
specified  by  1 above; 

4.  the  demonstration  chat  Che  llrgulatlc  representations 
and  constraints  arrived  at  in  1 above  and  the  procedure 
specified  in  2 above,  are  empirically  correct. 
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Given  the  fundamental  correctness  of  the  assumptions  and  arguments 
summarized  in  this  paper  we  would  hope  that  the  successful  completion  c f 
the  work  will  simultaneously  yield  a theory  of  grammar,  a theory  of 
language  acquisition,  a proof  of  their  mutual  compatibility,  and  further 
empirical  support  for  the  entire  theoretical  apparatus  and  the  inter- 
actions between  its  components. 
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FOOTNOTES 

*Thls  work  was  partially  supported  by  the  Office  of  Naval  Research 
and  the  Advanced  Research  Projects  Agency,  ONR  Contract 
N 00014-69-A-0200-6006. 

^he  published  work  consists  of  Haaiburger  and  Wexler  (1973a),  and 
Wexler  and  Hamburger  (1973).  Hamburger  and  Wexler  (1973b)  will  appear 
in  print  shortly.  The  unpublished  work  consists  of  Culicover  and  Wexler 
(1973a, b;  1974),  and  Wexler  and  Culicover  (1973,1974).  The  book  in 
preparation  is  Wexler,  Culicover  and  Haiaburger  (in  preparation) . 

2 

It  is  even  conceivable,  but  highly  speculative,  that  some  formal 
unlversals  of  language,  for  example,  the  Freezing  Principle,  are  special 
cases  of  a principle  that  applies  in  all  cognitive  domains,  and  that  the 
function  of  the  principle  in  all  these  domains  is  the  same — namely,  .'t 
makes  the  domains  leamable.  We  know  of  no  evidence  for  or  against  this 
conjecture,  which  nevertheless  suggests  directions  for  research  in  other 
fields.  It  is  possible  however,  that  the  nature  of  linguistic  structure 
may  be  sufficiently  different  from  that  of  other  cognitive  domains  to  make 
the  search  for  something  like  the  Freezing  Principle  a difficult  one. 

3 

An  exception  to  this  statement  is  Chomsky  (1955,  Ch.  VIII  especially), 
in  which  the  original  constraints  on  transformations  are  proposed  on  the 
basis  of  logical  analysis  (although  not  on  the  basis  of  formal  leamabillty 
considerations) . 

4 

As  presented,  for  example,  in  Chomsky  (1970). 

^We  are  ignoring  here  the  stages  in  the  derivation  prior  to  the  comple- 
tion of  lexical  insertion.  Pq  is  assumed  to  be  the  base  phrase  marker  with 
all  lexical  items  inserted  in  this  discussion. 
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Much  of  the  following  account  Is  taken  directly  from  Wexler  and 
Cullcover  (1973). 

^Higgins  (1973)  argues  against  Emonds'  analysis,  but  we  feel  that 
there  Is  considerable  value  In  trying  to  maintain  Emonds'  analysis  In 
light  of  the  applicability  of  the  Freezing  Principle  as  shown  In  this 
discussion.  Many  of  the  difficulties  that  Higgins  points  out  can  be 
dealt  with  within  the  framework  of  the  SUBJECT  REPLACEMENT  analysis. 

Also,  many  of  his  arguments  do  not  apply  to  the  Freezing  Principle 
analysis. 

®Hlgglns  (1973)  notes  this  data. 

9 

One  serious  problem  with  this  analysis  whlca  we  have  discovered 
thus  far  Is  that  the  PASSIVE  transformation  may  apply  to  the  output  of  the 
DATIVE  transformation,  giving  sentences  like 

(1)  Mary  was  given  a book  by  John. 

In  Cullcover  and  Wexler  (1973a)  we  suggest  an  explanation  for  this  fact; 
however,  we  do  not  find  the  explanation  particularly  satisfactory,  and  the 
problem  remains. 

^^We  believe,  in  fact,  that  (ISa)  is  the  correct  structure.  The  structure 
used  in  (4)  Is  given  for  expository  purposes  only.  In  either  case,  none  of 
the  arguments  art  affected. 

He  writes,  "Ben  Shapiro  (personal  cottmunlcatlon)  has  found  that  some 
people,  like  me,  reject  any  sentence  Involving  chopping  either  the  direct 
object  or  the  Indirect  object;  others  accept  some  sentences  in  which  the 
direct  object  has  been  chopped,  but  reject  sentences  in  which  the  Indirect 
object  has  been  chopped." 
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12 

We  note  here  in  passing  that  this  possibility  ml^t  provide  a mecha- 
nism in  the  child's  learning  procedure  which  will  predict  that  over  time 
sentences  of  a certain  kind  will  change  from  being  ungrammatical  to  being 
grammatical.  Kistorical  change,  of  course,  provides  a rich  source  of 
phenomena  to  which  this  theory  might  be  applicable,  the  point  of  view  being 
that  much  change  is  caused  by  the  langiiage  learning  mechanism,  particularly 
when  more  than  one  analysis  is  compatible  with  the  data  available  to  the 
child  and  with  the  language  learning  procedure.  It  seems  possible  that  the 
theory  can  make  precise  predictions  about  what  changes  will  take  place. 

13 

Thus  this  discussion  does  not  make  the  usual  assumption  that  in 
Chomsky-adjuncclon  the  label  of  the  new  node  is  Identical  to  that  of  the  nod 
which  it  dominates.  If  ve  wished  to  maintain  this  assumption,  however,  then 
there  is  an  alternative  account  of  Hankamer's  judgments.  Suppose  that  the 
learner  hypothesized  that  the  output  of  DATIVE  was  (i). 

(i) 

John 


gave  Bill  the  book 


If  tnere  is  no  base  rule  of  the  form  VP  V NP  NP  then  VP  in  (i) 
will  be  frozen.  lienee  neither  NP  which  this  VP  dominates  will  be  moveable 
by  WH  FRONTING. 

The  issue  thus  reduces  to  the  question  of  whether  only  one  type  of 
adjunction  is  possible,  with  a possible  ambiguity  in  the  labelling  of  the 
newly  created  node,  or  whether  there  are  two  kinds  of  adjunction  possible. 
While  we  have  no  reason  to  prefer  one  over  the  other  at  this  point,  it  may 
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well  be  that  some  of  the  leamabillty  theorems  can  only  be  proved  In  the 
case  of  one  alternative  and  not  the  other. 

14 

Thus  we  would  argue  that  this  must  be  a transformationally  derived 
order,  as  is  suggested  by  Venneman  (1973). 

^^Tom  Wasow  has  Informed  us  of  an  observation  of  Richard  Oehrle's 
concerning  palrr  of  sentences  like  the  following. 

(i)  John  bought  a cemetary  plot  for  his  grandchildren. 

(11)  John  bought  his  grandchildren  a cemetary  plot. 

According  to  Oehrle,  (11)  must  have  the  interpretation  that  John's  grand- 
children exist , while  In  the  case  of  (1)  John  need  not  have  any  grandchildren 
yet.  Given  that  this  Is  In  fact  the  state  of  affairs,  It  follows  first  that 
for  causes  opacity,  and  second  that  both  (1)  and  (11)  are  possible  underlying 
structures,  l.e.,  there  Is  no  transformation  of  FOR-DATIVE.  However,  from 
the  second  conclusion  It  follows  that  the  transformation  of  DATIVE  In  the 
case  of  verbs  like  give  does  not  cause  freezing  since  It  derives  a possible 
base  structure.  Hence  It  may  be  necessary  to  account  for  the  ungrammatlcallty 
of  *Who  did  you  buy  a book  by  some  other  device  than  the  Freezing  Principle. 
This  reformul  \tlon  of  the  analysis  of  DATIVE  would  permit  us  to  avoid  the 
problem  with  the  PASSIVE  transformation  raised  In  footnote  9 above. 

On  the  other  hand.  It  seems  to  us  that  (1)  can  be  analyzed  as  (111). 

(Ill)  John  bought  ^[e  cemetary  plot  for  his  grandchildren]. 

If  this  Is  the  case,  then  one  might  make  the  argument  that  the  for  which 
undergoes  FOR-DATIVE  Is  not  an  opacity  causing  element,  while  the  for  which 
appears  In  the  NP  In  (111)  Is.  The  difference  between  the  two  for's  Is 
clear:  the  first  Is  benefactlve,  while  the  second  Is  purposive.  The 

following  examples  make  the  distinction  apparent. 
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(iva)  John  bought  a box  for  storing  his  toys. 

(ivb)  John  bought  a box  for  his  mother. 

(va)  *Jc'bn  bought  storing  nis  toys  a box. 

(vb)  John  bought  his  mother  a box. 

Example  (ivb)  has  two  interpretations.  Either  John  bought  a box  to  give 
to  his  mother  (benef active)  or  he  bought  a box  for  his  mother  to  use 
(purposive).  The  benef active  for , since  it  implies  immediate  transfer  of 
possession  to  the  benefactee,  requires  the  existence  of  the  benefactee. 

The  purposive  for , since  it  implies  the  use  of  the  item  by  someone  at  an 
unspecified  time  in  the  future,  does  not  carry  with  it  this  requirement. 

These  d’.ta  also  show  that  the  child  probably  has  not  yet  completely 
learned  the  deep  structure  order,  since  Samoan  (according  to  Schwartz,  1972) 
is  VSO.  Note  that  our  theory  does  not  explain  why  there  is  a two  word 
stage.  This  may  very  well  have  to  do  with  a memory  limitation,  as  has  been 
suggested  in  the  literature,  or  it  may  be  a result  of  a child  following 
a certain  testing  strategy  for  discovering  the  order  of  deep  structure 
categories.  (To  our  knowledge  this  latter  hypothesis  has  not  been  mentioned 
in  the  literature.)  It  may  be  that  the  child  can  get  more  useful  information 
about  tins  order  if  he  attempts  to  test  the  relative  order  of  two  categories 
at  once,  rather  titan  tltree  or  more  categories,  from  the  outset  of  learning. 

To  understand  thi.s  question  precisely,  of  course,  would  require  considerably 
more  analysis. 

^^Note  that  the  only  deep  structure  order  consistent  with  these  data 
and  the  invariance  Principle  is  SOV,  so  we  might  hypothesize  that  this  is 
the  order  which  Gia  has  established  at  this  stage.  That  is,  she  has  two 
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ordering  rules,  the  first  (subject  precedes  object)  being  correct  for 
adult  English,  and  the  second  (object  precedes  verb)  being  Incorrect 
for  adult  English.  Gla  II,  just  6 weeks  later,  has  learned  the  correct 
adult  order.  Thus  It  appears  that  this  process  Is  very  fast,  at  least 
for  the  major  syntactic  categories,  once  the  child  has  reached  the  two 
word  stage. 
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